我想要做的是从邮政编码中检索城市和州.这是我到目前为止所拥有的:
def find_city(zip_code): zip_code = str(zip_code) url = 'http://www.unitedstateszipcodes.org/' + zip_code source_code = requests.get(url) plain_text = source_code.text index = plain_text.find(">") soup = BeautifulSoup(plain_text, "lxml") stuff = soup.findAll('div', {'class': 'col-xs-12 col-sm-6 col-md-12'})
我也尝试使用id ="zip-links",但这不起作用.但事情就是这样:当我跑步时,print(plain_text)
我得到以下内容:
403 Forbidden Forbidden
You don't have permission to access /80123 on this server.
所以我想我的问题是:是否有更好的方法从邮政编码获得城市和州?或者有一个原因,unitedstateszipcodes.gov不合作.毕竟,很容易看到源,标签和文本.谢谢
您需要添加用户代理:
headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"} def find_city(zip_code): zip_code = str(zip_code) url = 'http://www.unitedstateszipcodes.org/' + zip_code source_code = requests.get(url,headers=headers)
完成后,响应为200,您将获得源:
In [8]: url = 'http://www.unitedstateszipcodes.org/54115' In [9]: headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"} In [10]: url = 'http://www.unitedstateszipcodes.org/54115' In [11]: source_code = requests.get(url,headers=headers) In [12]: source_code.status_code Out[12]: 200
如果您想要详细信息,可以轻松解析:
In [59]: soup = BeautifulSoup(plain_text, "lxml") In [60]: soup.find('div', id='zip-links').h3.text Out[60]: 'ZIP Code: 54115' In [61]: soup.find('div', id='zip-links').h3.next_sibling.strip() Out[61]: 'De Pere, WI 54115' In [62]: url = 'http://www.unitedstateszipcodes.org/90210' In [63]: source_code = requests.get(url,headers=headers).text In [64]: soup = BeautifulSoup(source_code, "lxml") In [65]: soup.find('div', id='zip-links').h3.text Out[66]: 'ZIP Code: 90210' In [70]: soup.find('div', id='zip-links').h3.next_sibling.strip() Out[70]: 'Beverly Hills, CA 90210'
您还可以将每个结果存储在数据库中,并首先尝试在数据库中进行查找.