当前位置:  开发笔记 > 编程语言 > 正文

如何从脚本中搜索Stack Overflow问题?

如何解决《如何从脚本中搜索StackOverflow问题?》经验,为你挑选了2个好方法。

给定一系列关键字,例如"Python最佳实践",我想获得包含关键字的前10个Stack Overflow问题,按相关性(?)排序,比如Python脚本.我的目标是最终得到一个元组列表(标题,URL).

我怎么能做到这一点?您会考虑查询Google吗?(你会怎么用Python做的?)



1> Jeremy Ruten..:

由于Stackoverflow已经具有此功能,您只需获取搜索结果页面的内容并获取所需的信息.以下是按相关性搜索的网址:

/sf/ask/17360801/?q=python+best+practices&sort=relevance

如果您查看来源,您会看到每个问题所需的信息都在这样的一行:

What are the best RSS feeds for programmers/developers?

因此,您应该能够通过正则表达式搜索该表单的字符串来获得前十个.



2> itsadok..:
>>> from urllib import urlencode
>>> params = urlencode({'q': 'python best practices', 'sort': 'relevance'})
>>> params
'q=python+best+practices&sort=relevance'
>>> from urllib2 import urlopen
>>> html = urlopen("http://stackoverflow.com/search?%s" % params).read()
>>> import re
>>> links = re.findall(r'

([^<]*)

', html) >>> links [('/questions/5119/what-are-the-best-rss-feeds-for-programmersdevelopers#5150', 'What are the best RSS feeds for programmers/developers?'), ('/questions/3088/best-ways-to-teach-a-beginner-to-program#13185', 'Best ways to teach a beginner to program?'), ('/questions/13678/textual-versus-graphical-programming-languages#13886', 'Textual versus Graphical Programming Languages'), ('/questions/58968/what-defines-pythonian-or-pythonic#59877', 'What defines “pythonian” or “pythonic”?'), ('/questions/592/cxoracle-how-do-i-access-oracle-from-python#62392', 'cx_Oracle - How do I access Oracle from Python? '), ('/questions/7170/recommendation-for-straight-forward-python-frameworks#83608', 'Recommendation for straight-forward python frameworks'), ('/questions/100732/why-is-if-not-someobj-better-than-if-someobj-none-in-python#100903', 'Why is if not someobj: better than if someobj == None: in Python?'), ('/questions/132734/presentations-on-switching-from-perl-to-python#134006', 'Presentations on switching from Perl to Python'), ('/questions/136977/after-c-python-or-java#138442', 'After C++ - Python or Java?')] >>> from urlparse import urljoin >>> links = [(urljoin('http://stackoverflow.com/', url), title) for url,title in links] >>> links [('http://stackoverflow.com/questions/5119/what-are-the-best-rss-feeds-for-programmersdevelopers#5150', 'What are the best RSS feeds for programmers/developers?'), ('http://stackoverflow.com/questions/3088/best-ways-to-teach-a-beginner-to-program#13185', 'Best ways to teach a beginner to program?'), ('http://stackoverflow.com/questions/13678/textual-versus-graphical-programming-languages#13886', 'Textual versus Graphical Programming Languages'), ('http://stackoverflow.com/questions/58968/what-defines-pythonian-or-pythonic#59877', 'What defines “pythonian” or “pythonic”?'), ('http://stackoverflow.com/questions/592/cxoracle-how-do-i-access-oracle-from-python#62392', 'cx_Oracle - How do I access Oracle from Python? '), ('http://stackoverflow.com/questions/7170/recommendation-for-straight-forward-python-frameworks#83608', 'Recommendation for straight-forward python frameworks'), ('http://stackoverflow.com/questions/100732/why-is-if-not-someobj-better-than-if-someobj-none-in-python#100903', 'Why is if not someobj: better than if someobj == None: in Python?'), ('http://stackoverflow.com/questions/132734/presentations-on-switching-from-perl-to-python#134006', 'Presentations on switching from Perl to Python'), ('http://stackoverflow.com/questions/136977/after-c-python-or-java#138442', 'After C++ - Python or Java?')]

将其转换为函数应该是微不足道的.

编辑:哎呀,我会做的......

def get_stackoverflow(query):
    import urllib, urllib2, re, urlparse
    params = urllib.urlencode({'q': query, 'sort': 'relevance'})
    html = urllib2.urlopen("http://stackoverflow.com/search?%s" % params).read()
    links = re.findall(r'

([^<]*)

', html) links = [(urlparse.urljoin('http://stackoverflow.com/', url), title) for url,title in links] return links

推荐阅读
大大炮
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有