当前位置:  开发笔记 > 编程语言 > 正文

Python Scrapy教程KeyError:'未找到蜘蛛:

如何解决《PythonScrapy教程KeyError:'未找到蜘蛛:》经验,为你挑选了1个好方法。

我正在尝试编写我的第一个scrapy蜘蛛,我一直在关注http://doc.scrapy.org/en/latest/intro/tutorial.html上的教程但我得到一个错误"KeyError:'Spi​​der not found :"

我想我正在从正确的目录运行命令(带有scrapy.cfg文件的目录)

(proscraper)#( 10/14/14@ 2:06pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   tree
.
??? scrapy
?   ??? __init__.py
?   ??? items.py
?   ??? pipelines.py
?   ??? settings.py
?   ??? spiders
?       ??? __init__.py
?       ??? juno_spider.py
??? scrapy.cfg

2 directories, 7 files
(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   ls
scrapy  scrapy.cfg

这是我得到的错误

(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   scrapy crawl juno
/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/twisted/internet/_sslverify.py:184: UserWarning: You do not have the service_identity module installed. Please install it from . Without the service_identity module and a recent enough pyOpenSSL tosupport it, Twisted can perform only rudimentary TLS client hostnameverification.  Many valid certificate/hostname mappings may be rejected.
  verifyHostname, VerificationError = _selectVerifyImplementation()
Traceback (most recent call last):
  File "/home/tim/.virtualenvs/proscraper/bin/scrapy", line 9, in 
    load_entry_point('Scrapy==0.24.4', 'console_scripts', 'scrapy')()
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/commands/crawl.py", line 58, in run
    spider = crawler.spiders.create(spname, **opts.spargs)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/spidermanager.py", line 44, in create
    raise KeyError("Spider not found: %s" % spider_name)
KeyError: 'Spider not found: juno'

这是我的生活:

(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   pip freeze
Scrapy==0.24.4
Twisted==14.0.2
cffi==0.8.6
cryptography==0.6
cssselect==0.9.1
ipdb==0.8
ipython==2.3.0
lxml==3.4.0
pyOpenSSL==0.14
pycparser==2.10
queuelib==1.2.2
six==1.8.0
w3lib==1.10.0
wsgiref==0.1.2
zope.interface==4.1.1

这是我的蜘蛛的代码,其中填充了name属性:

(proscraper)#( 10/14/14@ 2:14pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   cat scrapy/spiders/juno_spider.py 
import scrapy

class JunoSpider(scrapy.Spider):
    name = "juno"
    allowed_domains = ["http://www.juno.co.uk/"]
    start_urls = [
        "http://www.juno.co.uk/dj-equipment/"
    ]

    def parse(self, response):
        filename = response.url.split("/")[-2]
        with open(filename, 'wb') as f:
            f.write(response.body)

dreyescat.. 8

当您使用scrapy作为项目名称启动项目时,它会创建您打印的目录结构:

.
??? scrapy
?   ??? __init__.py
?   ??? items.py
?   ??? pipelines.py
?   ??? settings.py
?   ??? spiders
?       ??? __init__.py
?       ??? juno_spider.py
??? scrapy.cfg

但是使用scrapy作为项目名称具有附带效果.如果打开生成的内容,scrapy.cfg您将看到默认设置指向您的scrapy.settings模块.

[settings]
default = scrapy.settings

当我们捕获scrapy.settings文件时,我们看到:

BOT_NAME = 'scrapy'

SPIDER_MODULES = ['scrapy.spiders']
NEWSPIDER_MODULE = 'scrapy.spiders'

好吧,这里没什么奇怪的.机器人名称,Scrapy将寻找蜘蛛的模块列表,以及使用genspider命令创建新蜘蛛的模块.到现在为止还挺好.

现在让我们检查一下scrapy库.它已经正确安装在您的proscraper隔离virtualenv /home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy目录下.请记住,site-packages它始终会添加到sys.path包含Python将搜索模块的所有路径的位置.所以,猜猜看... scrapy库还有一个导入的settings模块/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings,/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings/default_settings.py它包含所有设置的默认值.特别注意默认SPIDER_MODULES条目:

SPIDER_MODULES = []

也许你开始了解正在发生的事情.选择scrapy作为项目名称也会生成一个scrapy.settings与scrapy库冲突的模块scrapy.settings.这里是插入相应路径的顺序sys.path将使Python导入一个或另一个.首先出现胜利.在这种情况下,scrapy库设置获胜.因此KeyError: 'Spider not found: juno'.

要解决此冲突,您可以将项目文件夹重命名为其他名称,让我们说scrap:

.
??? scrap
?   ??? __init__.py

修改您scrapy.cfg的指向正确的settings模块:

[settings]
default = scrap.settings

并更新您scrap.settings的指向正确的蜘蛛:

SPIDER_MODULES = ['scrap.spiders']

但正如@paultrmbrth建议我用另一个名字重新创建项目.



1> dreyescat..:

当您使用scrapy作为项目名称启动项目时,它会创建您打印的目录结构:

.
??? scrapy
?   ??? __init__.py
?   ??? items.py
?   ??? pipelines.py
?   ??? settings.py
?   ??? spiders
?       ??? __init__.py
?       ??? juno_spider.py
??? scrapy.cfg

但是使用scrapy作为项目名称具有附带效果.如果打开生成的内容,scrapy.cfg您将看到默认设置指向您的scrapy.settings模块.

[settings]
default = scrapy.settings

当我们捕获scrapy.settings文件时,我们看到:

BOT_NAME = 'scrapy'

SPIDER_MODULES = ['scrapy.spiders']
NEWSPIDER_MODULE = 'scrapy.spiders'

好吧,这里没什么奇怪的.机器人名称,Scrapy将寻找蜘蛛的模块列表,以及使用genspider命令创建新蜘蛛的模块.到现在为止还挺好.

现在让我们检查一下scrapy库.它已经正确安装在您的proscraper隔离virtualenv /home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy目录下.请记住,site-packages它始终会添加到sys.path包含Python将搜索模块的所有路径的位置.所以,猜猜看... scrapy库还有一个导入的settings模块/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings,/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings/default_settings.py它包含所有设置的默认值.特别注意默认SPIDER_MODULES条目:

SPIDER_MODULES = []

也许你开始了解正在发生的事情.选择scrapy作为项目名称也会生成一个scrapy.settings与scrapy库冲突的模块scrapy.settings.这里是插入相应路径的顺序sys.path将使Python导入一个或另一个.首先出现胜利.在这种情况下,scrapy库设置获胜.因此KeyError: 'Spider not found: juno'.

要解决此冲突,您可以将项目文件夹重命名为其他名称,让我们说scrap:

.
??? scrap
?   ??? __init__.py

修改您scrapy.cfg的指向正确的settings模块:

[settings]
default = scrap.settings

并更新您scrap.settings的指向正确的蜘蛛:

SPIDER_MODULES = ['scrap.spiders']

但正如@paultrmbrth建议我用另一个名字重新创建项目.

推荐阅读
手机用户2502852037
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有