我正在尝试调用import.io API.此调用需要具有以下结构:
' https://extraction.import.io/query/extractor/ {{crawler_id}}?_ apikey = xxx&url = http://www.example.co.uk/items.php?sortby=Price_LH&per_page=96&size=1%2C12&page = 35 '
您可以在该调用中看到,还必须包含参数"url":
http://www.example.co.uk/items.php?sortby=Price_LH&per_page=96&size=1%2C12&page=35
恰好这个辅助URL也需要参数.但是,如果我将其作为普通字符串传递,如上例所示,API响应仅在我获得API响应时包含第一个参数之前的部分:
http://www.example.co.uk/items.php?sortby=Price_LH
这是不正确的,似乎它将使用不完整的URL而不是我传入的URL进行调用.
我正在使用Python并请求以下列方式进行调用:
import requests import json row_dict = {'url': u'http://www.example.co.uk/items.php?sortby=Price_LH&per_page=96&size=1%2C12&page=35', 'crawler_id': u'zzz'} url_call = 'https://extraction.import.io/query/extractor/{0}?_apikey={1}&url={2}'.format(row_dict['crawler_id'], auth_key, row_dict['url']) r = requests.get(url_call) rr = json.loads(r.content)
当我打印reuslt时:
"url" : "http://www.example.co.uk/items.php?sortby=Price_LH",
但是当我打印r.url时:
https://extraction.import.io/query/extractor/zzz?_apikey=xxx&url=http://www.example.co.uk/items.php?sortby=Price_LH&per_page=96&size=1%2C12&page=35
所以在URL中它似乎都很好但不在响应中.
我尝试使用其他URL,并在第一个参数后切换.
该requests
库将处理您所有的URL编码需求。这是使用requests
以下命令向URL添加参数的正确方法:
import requests base_url = "https://extraction.import.io/query/extractor/{{crawler_id}}" params = dict() params["_apikey"] = "xxx" params["url"] = "http://www.example.co.uk/items.php?sortby=Price_LH&per_page=96&size=1%2C12&page=35" r = requests.get(base_url, params=params) print(r.url)
可以更容易理解的方式设置参数格式:
params = { "_apikey" : "xxx", "url" : "http://www.example.co.uk/items.php?sortby=Price_LH&per_page=96&size=1%2C12&page=35" }
您需要对要发送到API的URL 进行URL编码.
原因是服务器将&符号解释为URL https://extraction.import.io/query/extractor/XXX的参数标记?
这就是为什么他们被剥夺了网址:
http://www.example.co.uk/items.php?sortby=Price_LH
请尝试以下方法urllib.quote(row_dict['url'])
:
import requests import json import urllib row_dict = {'url': u'http://www.example.co.uk/items.php?sortby=Price_LH&per_page=96&size=1%2C12&page=35', 'crawler_id': u'zzz'} url_call = 'https://extraction.import.io/query/extractor/{0}?_apikey={1}&url={2}'.format(row_dict['crawler_id'], auth_key, urllib.quote(row_dict['url'])) r = requests.get(url_call) rr = json.loads(r.content)