Skip to content

Scrapy freezes with middleware #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Vintodrimmer opened this issue Jun 20, 2018 · 3 comments · Fixed by #8
Closed

Scrapy freezes with middleware #5

Vintodrimmer opened this issue Jun 20, 2018 · 3 comments · Fixed by #8

Comments

@Vintodrimmer
Copy link

Good day,

I'm trying to run scrapy 1.5.0 with scrapy-selenium, but the moment I add the middleware, scrapy loads my CPU 100% and doesn't stop until I explicitly kill it. I have tried with both 'chromedriver' and 'geckodriver' on two different Linux distributions.

What may I be doing wrong?

➤ scrapy crawl order_selenium
2018-06-20 10:02:41 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: github)
2018-06-20 10:02:41 [scrapy.utils.log] INFO: Versions: lxml 4.2.1.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 16.2.0, Python 3.6.5 (default, Apr 27 2018, 21:02:20) - [GCC 7.3.0], pyOpenSSL 18.0.0 (OpenSSL 1.1.0h  27 Mar 2018), cryptography 2.2.2, Platform Linux-4.17.2-chrysalis-x86_64-with-glibc2.3.4
2018-06-20 10:02:41 [scrapy.crawler] INFO: Overridden settings: {'AUTOTHROTTLE_ENABLED': True, 'BOT_NAME': 'github', 'CONCURRENT_REQUESTS_PER_DOMAIN': 1, 'COOKIES_ENABLED': False, 'DOWNLOAD_DELAY': 5, 'EDITOR': 'vis', 'NEWSPIDER_MODULE': 'github.spiders', 'SPIDER_MODULES': ['github.spiders'], 'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.170 Safari/537.36'}
2018-06-20 10:02:41 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.throttle.AutoThrottle']
2018-06-20 10:02:43 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:38489/session {"capabilities":
{"firstMatch": [{}], "alwaysMatch": {"browserName": "chrome", "platformName": "any", "goog:chromeOptions": {"extensions": [],
"args": ["--headless"]}}}, "desiredCapabilities": {"browserName": "chrome", "version": "", "platform": "ANY", "goog:chromeOptions": {"extensions": [], "args": ["--headless"]}}}
2018-06-20 10:02:43 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2018-06-20 10:02:43 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy_selenium.SeleniumMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-06-20 10:02:43 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-06-20 10:02:43 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-06-20 10:02:43 [scrapy.core.engine] INFO: Spider opened
2018-06-20 10:02:43 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
@bobchenbc
Copy link

In middleware.py
change line 76 to

return None

@Vintodrimmer
Copy link
Author

Vintodrimmer commented Aug 6, 2018

@bobchenbc Line 76 is commented out but the block it is part of does have return None.

middlewares.py

I have uncommented it and changed to return None, but it didn't help.

@clemfromspace
Copy link
Owner

@Vintodrimmer Hi, I guess @bobchenbc was talking about this file: https://github.com/clemfromspace/scrapy-selenium/blob/develop/scrapy_selenium/middlewares.py#L76

I should have a fix tomorrow for it.

nit-in pushed a commit to nit-in/scrapy-selenium that referenced this issue Nov 15, 2023
…cript-execution

add selenium utility methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants