当前位置:  开发笔记 > 编程语言 > 正文

转换网址为抓取工具

如何解决《转换网址为抓取工具》经验,为你挑选了1个好方法。

我正在研究一个爬虫.通常,当我在浏览器中输入url1时,浏览器会将其转换为url2.我怎么能用Python做到这一点?

url1:www.odevsitesi.com/ara.asp?kelime =doğanındengesininbozulması

url2:www.odevsitesi.com/ara.asp?kelime=do%F0an%FDn%20dengesinin%20bozulmas%FD



1> Alex Martell..:

您需要正确编码URL(在您的情况下为iso-8859-9),将其分成几部分,urllib.quote查询部分,然后再将它们放在一起.即:

>>> import urlparse
>>> import urllib
>>> x = u'http://www.odevsitesi.com/ara.asp?kelime=do?an?n dengesinin bozulmas?' 
>>> y = x.encode('iso-8859-9')
>>> # just to show what the split of y looks like (we can also handle it as a tuple):
>>> urlparse.urlsplit(y)
SplitResult(scheme='http', netloc='www.odevsitesi.com', path='/ara.asp', query='kelime=do\xf0an\xfdn dengesinin bozulmas\xfd', fragment='')
>>> z = urlparse.urlsplit(y)
>>> quoted = z[:3] + (urllib.quote(z.query), z.fragment)
>>> # now just to show you what the 'quoted' tuple looks like:
>>> quoted
('http', 'www.odevsitesi.com', '/ara.asp', 'kelime%3Ddo%F0an%FDn%20dengesinin%20bozulmas%FD', '')
>>> # and finally putting it back together:
>>> urlparse.urlunsplit(quoted)
'http://www.odevsitesi.com/ara.asp?kelime%3Ddo%F0an%FDn%20dengesinin%20bozulmas%FD'

推荐阅读
LEEstarmmmmm
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有