我有一个在我的系统上运行的webscraper,我想将它迁移到PythonAnywhere,但是当我移动它现在它不起作用.
恰好sendkeys似乎不起作用 - 在执行以下代码后,我从未进入下一个网页,因此属性错误会被触发.
我的代码:
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import csv import time # Lists for functions parcel_link =[] token = [] csv_output = [ ] # main scraping function def getLinks(link): # Open web browser and get url - 3 second time delay. #Open web browser and get url - 3 second time delay. driver.get(link) time.sleep(3) inputElement = driver.find_element_by_id("mSearchControl_mParcelID") inputElement.send_keys(parcel_code+"*/n") print("ENTER hit") pageSource = driver.page_source bsObj = BeautifulSoup(pageSource) parcel_link.clear() print(bsObj) #pause = WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.ID, "mResultscontrol_mGrid_RealDataGrid"))) for link in bsObj.find(id="mResultscontrol_mGrid_RealDataGrid").findAll('a'): parcel_link.append(link.text) print(parcel_link) for test in parcel_link: clickable = driver.find_element_by_link_text(test) clickable.click() time.sleep(2)
我试图操作的链接是:https: //ascendweb.jacksongov.org/ascend/%280yzb2gusuzb0kyvjwniv3255%29/search.aspx
而我正在尝试发送:15-100*
追溯:
03:12 ~/Tax_Scrape $ xvfb-run python3.4 Jackson_Parcel_script.py Traceback (most recent call last): File "Jackson_Parcel_script.py", line 377, ingetLinks("https://ascendweb.jacksongov.org/ascend/%28biohwjq5iibvvkisd1kjmm45%29/result.aspx") File "Jackson_Parcel_script.py", line 29, in getLinks inputElement = driver.find_element_by_id("mSearchControl_mParcelID") File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/webdriver.py", line 206, in find_element_by_id return self.find_element(by=By.ID, value=id_) File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/webdriver.py", line 662, in find_element {'using': by, 'value': value})['value'] File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/webdriver.py", line 173, in execute self.error_handler.check_response(response) File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/errorhandler.py", line 164, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: 'Unable to locate element: {"method":"id","selector":"mSearchControl_mParcelID"}' ; Stac ktrace: at FirefoxDriver.findElementInternal_ (file:///tmp/tmpiuuqg3m7/extensions/fxdriver@googlecode.com/components/driver_component.js:9470) at FirefoxDriver.findElement (file:///tmp/tmpiuuqg3m7/extensions/fxdriver@googlecode.com/components/driver_component.js:9479) at DelayedCommand.executeInternal_/h (file:///tmp/tmpiuuqg3m7/extensions/fxdriver@googlecode.com/components/command_processor.js:11455) at DelayedCommand.executeInternal_ (file:///tmp/tmpiuuqg3m7/extensions/fxdriver@googlecode.com/components/command_processor.js:11460) at DelayedCommand.execute/< (file:///tmp/tmpiuuqg3m7/extensions/fxdriver@googlecode.com/components/command_processor.js:11402) 03:13 ~/Tax_Scrape $
Selenium Innitation:
for retry in range(3): try: driver = webdriver.Firefox() break except: time.sleep(3) for parcel_code in token: getLinks("https://ascendweb.jacksongov.org/ascend/%28biohwjq5iibvvkisd1kjmm4 5%29/result.aspx")
PythonAnywhere使用FireFox的虚拟实例,假设像JSPhantom一样无头,所以我没有版本号.
任何帮助都会很棒
RS
好吧,也许使用的浏览器PythonAnywhere
不能足够快地加载网站.因此,而不是time.sleep(3)
尝试隐式地等待元素.
隐式等待是指在尝试查找一个或多个元素(如果它们不是立即可用)时,WebDriver轮询DOM一段时间.默认设置为0.设置后,将为WebDriver对象实例的生命周期设置隐式等待.
一般而言,使用time.sleep()
with Selenium
并不是一个好主意.
并且给它不仅仅是3
秒,implicitly_wait()
指定等待元素所花费的最长时间.
因此,如果您设置implicitly_wait(10)
并加载页面,例如,以5
秒为单位,那么Selenium
将只等待5
几秒钟.
driver.implicitly_wait(10)