假设我获得了一个URL.
它可能已经有GET参数(例如http://example.com/search?q=question
)或者可能没有(例如http://example.com/
).
现在我需要为它添加一些参数{'lang':'en','tag':'python'}
.在第一种情况下,我将拥有http://example.com/search?q=question&lang=en&tag=python
和在第二种情况下 - http://example.com/search?lang=en&tag=python
.
有没有标准的方法来做到这一点?
urllib
和urlparse
模块有几个怪癖.这是一个有效的例子:
try: import urlparse from urllib import urlencode except: # For Python 3 import urllib.parse as urlparse from urllib.parse import urlencode url = "http://stackoverflow.com/search?q=question" params = {'lang':'en','tag':'python'} url_parts = list(urlparse.urlparse(url)) query = dict(urlparse.parse_qsl(url_parts[4])) query.update(params) url_parts[4] = urlencode(query) print(urlparse.urlunparse(url_parts))
ParseResult
,结果urlparse()
,是只读的,我们需要把它转换成list
之前,我们可以尝试修改其数据.
我对这个页面上的所有解决方案都不满意(来吧,我们最喜欢的复制粘贴的东西在哪里?)所以我根据这里的答案编写了自己的解决方案.它试图完成并且更加Pythonic.我在参数中为dict和bool值添加了一个处理程序,以便更加消费者(JS)友好,但它们是可选的,你可以放弃它们.
测试1:添加新参数,处理Arrays和Bool值:
url = 'http://stackoverflow.com/test' new_params = {'answers': False, 'data': ['some','values']} add_url_params(url, new_params) == \ 'http://stackoverflow.com/test?data=some&data=values&answers=false'
测试2:重写现有的args,处理DICT值:
url = 'http://stackoverflow.com/test/?question=false' new_params = {'question': {'__X__':'__Y__'}} add_url_params(url, new_params) == \ 'http://stackoverflow.com/test/?question=%7B%22__X__%22%3A+%22__Y__%22%7D'
代码本身.我试图详细描述它:
from json import dumps try: from urllib import urlencode, unquote from urlparse import urlparse, parse_qsl, ParseResult except ImportError: # Python 3 fallback from urllib.parse import ( urlencode, unquote, urlparse, parse_qsl, ParseResult ) def add_url_params(url, params): """ Add GET params to provided URL being aware of existing. :param url: string of target URL :param params: dict containing requested params to be added :return: string with updated URL >> url = 'http://stackoverflow.com/test?answers=true' >> new_params = {'answers': False, 'data': ['some','values']} >> add_url_params(url, new_params) 'http://stackoverflow.com/test?data=some&data=values&answers=false' """ # Unquoting URL first so we don't loose existing args url = unquote(url) # Extracting url info parsed_url = urlparse(url) # Extracting URL arguments from parsed URL get_args = parsed_url.query # Converting URL arguments to dict parsed_get_args = dict(parse_qsl(get_args)) # Merging URL arguments dict with new params parsed_get_args.update(params) # Bool and Dict values should be converted to json-friendly values # you may throw this part away if you don't like it :) parsed_get_args.update( {k: dumps(v) for k, v in parsed_get_args.items() if isinstance(v, (bool, dict))} ) # Converting URL argument to proper query string encoded_get_args = urlencode(parsed_get_args, doseq=True) # Creating new parsed result object based on provided with new # URL arguments. Same thing happens inside of urlparse. new_url = ParseResult( parsed_url.scheme, parsed_url.netloc, parsed_url.path, parsed_url.params, encoded_get_args, parsed_url.fragment ).geturl() return new_url
请注意,可能存在一些问题,如果你找到一个请告诉我,我们会更好地做这件事
如果字符串可以包含任意数据,则需要使用URL编码(例如,"&符","斜杠"等字符需要进行编码).
查看urllib.urlencode:
>>> import urllib >>> urllib.urlencode({'lang':'en','tag':'python'}) 'lang=en&tag=python'
您还可以使用furl模块https://github.com/gruns/furl
>>> from furl import furl >>> print furl('http://example.com/search?q=question').add({'lang':'en','tag':'python'}).url http://example.com/search?q=question&lang=en&tag=python
将其外包给经过测试的请求库。
这就是我要做的:
from requests.models import PreparedRequest url = 'http://example.com/search?q=question' params = {'lang':'en','tag':'python'} req = PreparedRequest() req.prepare_url(url, params) print(req.url)
是的:使用urllib.
从文档中的示例:
>>> import urllib >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) >>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params) >>> print f.geturl() # Prints the final URL with parameters. >>> print f.read() # Prints the contents
根据这个答案,简单案例的单线程(Python 3代码):
from urllib.parse import urlparse, urlencode url = "/sf/ask/17360801/?q=question" params = {'lang':'en','tag':'python'} url += ('&' if urlparse(url).query else '?') + urlencode(params)
要么:
url += ('&', '?')[urlparse(url).query == ''] + urlencode(params)
如果您正在使用请求lib:
import requests ... params = {'tag': 'python'} requests.get(url, params=params)
我喜欢Łukasz版本,但由于urllib和urllparse函数在这种情况下使用起来有些尴尬,我认为做这样的事情更直接:
params = urllib.urlencode(params) if urlparse.urlparse(url)[4]: print url + '&' + params else: print url + '?' + params