我试图从UniProt获得一些结果,这是一个蛋白质数据库(细节并不重要).我正在尝试使用一种从一种ID转换为另一种ID的脚本.我能够在浏览器上手动执行此操作,但无法在Python中执行此操作.
在http://www.uniprot.org/faq/28中有一些示例脚本.我尝试了Perl,它似乎工作,所以问题是我的Python尝试.(工作)脚本是:
## tool_example.pl ## use strict; use warnings; use LWP::UserAgent; my $base = 'http://www.uniprot.org'; my $tool = 'mapping'; my $params = { from => 'ACC', to => 'P_REFSEQ_AC', format => 'tab', query => 'P13368 P20806 Q9UM73 P97793 Q17192' }; my $agent = LWP::UserAgent->new; push @{$agent->requests_redirectable}, 'POST'; print STDERR "Submitting...\n"; my $response = $agent->post("$base/$tool/", $params); while (my $wait = $response->header('Retry-After')) { print STDERR "Waiting ($wait)...\n"; sleep $wait; print STDERR "Checking...\n"; $response = $agent->get($response->base); } $response->is_success ? print $response->content : die 'Failed, got ' . $response->status_line . ' for ' . $response->request->uri . "\n";
我的问题是:
1)你会如何在Python中做到这一点?
2)我能够大规模"缩放"那个(即在查询字段中使用大量条目)吗?
问题#1:
这可以使用python的urllibs完成:
import urllib, urllib2 import time import sys query = ' '.join(sys.argv) # encode params as a list of 2-tuples params = ( ('from','ACC'), ('to', 'P_REFSEQ_AC'), ('format','tab'), ('query', query)) # url encode them data = urllib.urlencode(params) url = 'http://www.uniprot.org/mapping/' # fetch the data try: foo = urllib2.urlopen(url, data) except urllib2.HttpError, e: if e.code == 503: # blah blah get the value of the header... wait_time = int(e.hdrs.get('Retry-after', 0)) print 'Sleeping %i seconds...' % (wait_time,) time.sleep(wait_time) foo = urllib2.urlopen(url, data) # foo is a file-like object, do with it what you will. foo.read()