当前位置:  开发笔记 > 编程语言 > 正文

如何将HTML表格刮到CSV?

如何解决《如何将HTML表格刮到CSV?》经验,为你挑选了4个好方法。

使用python:

例如,假设你想从一些这样的网站凑在CSV形式的外汇报价:fxquotes

然后...

from BeautifulSoup import BeautifulSoup
import urllib,string,csv,sys,os
from string import replace

date_s = '&date1=01/01/08'
date_f = '&date=11/10/08'
fx_url = 'http://www.oanda.com/convert/fxhistory?date_fmt=us'
fx_url_end = '&lang=en&margin_fixed=0&format=CSV&redirected=1'
cur1,cur2 = 'USD','AUD'
fx_url = fx_url + date_f + date_s + '&exch=' + cur1 +'&exch2=' + cur1
fx_url = fx_url +'&expr=' + cur2 +  '&expr2=' + cur2 + fx_url_end
data = urllib.urlopen(fx_url).read()
soup = BeautifulSoup(data)
data = str(soup.findAll('pre', limit=1))
data = replace(data,'[
','')
data = replace(data,'
]','') file_location = '/Users/location_edit_this' file_name = file_location + 'usd_aus.csv' file = open(file_name,"w") file.write(data) file.close()

编辑:从表中获取值:示例来自:palewire

from mechanize import Browser
from BeautifulSoup import BeautifulSoup

mech = Browser()

url = "http://www.palewire.com/scrape/albums/2007.html"
page = mech.open(url)

html = page.read()
soup = BeautifulSoup(html)

table = soup.find("table", border=1)

for row in table.findAll('tr')[1:]:
    col = row.findAll('td')

    rank = col[0].string
    artist = col[1].string
    album = col[2].string
    cover_link = col[3].img['src']

    record = (rank, artist, album, cover_link)
    print "|".join(record)


Juan A. Nava.. 10

这是我使用(当前)最新版本的BeautifulSoup的python版本,可以使用,例如,

$ sudo easy_install beautifulsoup4

该脚本从标准输入读取HTML,并以适当的CSV格式输出所有表中的文本.

#!/usr/bin/python
from bs4 import BeautifulSoup
import sys
import re
import csv

def cell_text(cell):
    return " ".join(cell.stripped_strings)

soup = BeautifulSoup(sys.stdin.read())
output = csv.writer(sys.stdout)

for table in soup.find_all('table'):
    for row in table.find_all('tr'):
        col = map(cell_text, row.find_all(re.compile('t[dh]')))
        output.writerow(col)
    output.writerow([])


dkretz.. 5

更容易(因为它为你下次保存它)...

在Excel中

数据/导入外部数据/新Web查询

会带你到网址提示.输入您的网址,它将分隔要导入的页面上的可用表格.瞧.



1> mkoeller..:

在工具的UI中选择HTML表格并将其复制到剪贴板中(如果可能的话)

将其粘贴到Excel中.

保存为CSV文件

但是,这是一种手动解决方案而非自动化解决方案.



2> Thorvaldur..:

使用python:

例如,假设你想从一些这样的网站凑在CSV形式的外汇报价:fxquotes

然后...

from BeautifulSoup import BeautifulSoup
import urllib,string,csv,sys,os
from string import replace

date_s = '&date1=01/01/08'
date_f = '&date=11/10/08'
fx_url = 'http://www.oanda.com/convert/fxhistory?date_fmt=us'
fx_url_end = '&lang=en&margin_fixed=0&format=CSV&redirected=1'
cur1,cur2 = 'USD','AUD'
fx_url = fx_url + date_f + date_s + '&exch=' + cur1 +'&exch2=' + cur1
fx_url = fx_url +'&expr=' + cur2 +  '&expr2=' + cur2 + fx_url_end
data = urllib.urlopen(fx_url).read()
soup = BeautifulSoup(data)
data = str(soup.findAll('pre', limit=1))
data = replace(data,'[
','')
data = replace(data,'
]','') file_location = '/Users/location_edit_this' file_name = file_location + 'usd_aus.csv' file = open(file_name,"w") file.write(data) file.close()

编辑:从表中获取值:示例来自:palewire

from mechanize import Browser
from BeautifulSoup import BeautifulSoup

mech = Browser()

url = "http://www.palewire.com/scrape/albums/2007.html"
page = mech.open(url)

html = page.read()
soup = BeautifulSoup(html)

table = soup.find("table", border=1)

for row in table.findAll('tr')[1:]:
    col = row.findAll('td')

    rank = col[0].string
    artist = col[1].string
    album = col[2].string
    cover_link = col[3].img['src']

    record = (rank, artist, album, cover_link)
    print "|".join(record)



3> Juan A. Nava..:

这是我使用(当前)最新版本的BeautifulSoup的python版本,可以使用,例如,

$ sudo easy_install beautifulsoup4

该脚本从标准输入读取HTML,并以适当的CSV格式输出所有表中的文本.

#!/usr/bin/python
from bs4 import BeautifulSoup
import sys
import re
import csv

def cell_text(cell):
    return " ".join(cell.stripped_strings)

soup = BeautifulSoup(sys.stdin.read())
output = csv.writer(sys.stdout)

for table in soup.find_all('table'):
    for row in table.find_all('tr'):
        col = map(cell_text, row.find_all(re.compile('t[dh]')))
        output.writerow(col)
    output.writerow([])



4> dkretz..:

更容易(因为它为你下次保存它)...

在Excel中

数据/导入外部数据/新Web查询

会带你到网址提示.输入您的网址,它将分隔要导入的页面上的可用表格.瞧.

推荐阅读
手机用户2502851955
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有