我有几种不同的蜘蛛,想要立刻运行它们.基于此和此,我可以在同一个过程中运行多个蜘蛛.但是,我不知道如何设计一个信号系统,以便在所有蜘蛛完成后停止反应堆.
我试过了:
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
和
crawler.signals.connect(reactor.stop, signal=signals.spider_idle)
在这两种情况下,当第一个履带关闭时,反应器停止.当然,我希望在所有蜘蛛完成后反应堆停止.
有人能告诉我如何做到这一点吗?
睡了一夜后,我意识到我知道该怎么做.我只需要一个柜台:
from twisted.internet import reactor from scrapy.crawler import Crawler from scrapy import log, signals from scrapy.utils.project import get_project_settings class ReactorControl: def __init__(self): self.crawlers_running = 0 def add_crawler(self): self.crawlers_running += 1 def remove_crawler(self): self.crawlers_running -= 1 if self.crawlers_running == 0 : reactor.stop() def setup_crawler(spider_name): crawler = Crawler(settings) crawler.configure() crawler.signals.connect(reactor_control.remove_crawler, signal=signals.spider_closed) spider = crawler.spiders.create(spider_name) crawler.crawl(spider) reactor_control.add_crawler() crawler.start() reactor_control = ReactorControl() log.start() settings = get_project_settings() crawler = Crawler(settings) for spider_name in crawler.spiders.list(): setup_crawler(spider_name) reactor.run()
我假设Scrapy不平行.
我不知道这是否是最佳方式,但它确实有效!
编辑:已更新.见@ Jean-Robert评论.