-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Context:
- Playwright Version: 1.25.2
- Operating System: Linux Ubuntu
- Python version: 3.8.10
- Browser: Firefox, Chromium
- Extra:
Code Snippet
#!/usr/bin/env python3
import asyncio
from playwright.async_api import async_playwright
tracing_enabled = True
tracing_filepath = "trace.zip"
async def handle_download(download):
print("Found download for %s" % download.url)
download_filepath = await download.path()
print("Downloaded %s from %s" % (download_filepath, download.url))
return
async def main():
url1 = "https://www.mandiant.com/sites/default/files/2021-09/mandiant-apt1-report.pdf"
url2="https://057info.hr/doc/o_kolacicima.pdf"
url = url1
async with async_playwright() as p:
browser = await p.firefox.launch(headless=True)
context = await browser.new_context(accept_downloads=True)
if tracing_enabled:
await context.tracing.start(screenshots=True,
snapshots=True,
sources=True)
page = await context.new_page()
page.on('download', handle_download)
print("Visiting %s" % url)
try:
response = await page.goto(url, timeout=0)
except Exception as e:
print("Got exception %s" % e)
await page.close()
if tracing_enabled:
await context.tracing.stop(path = tracing_filepath)
await context.close()
await browser.close()
if __name__ == '__main__':
asyncio.run(main())
Describe the bug
When running the above code with Firefox, url2 downloads correctly, but url1 throws the following exception:
Visiting https://www.mandiant.com/sites/default/files/2021-09/mandiant-apt1-report.pdf
Found download for https://www.mandiant.com/sites/default/files/2021-09/mandiant-apt1-report.pdf
Exception in callback AsyncIOEventEmitter._emit_run.._callback(<Task finishe...ot NoneType')>) at /home/USER/python-virtual-environments/async/lib/python3.8/site-packages/pyee/_asyncio.py:55
handle: <Handle AsyncIOEventEmitter._emit_run.._callback(<Task finishe...ot NoneType')>) at /home/USER/python-virtual-environments/async/lib/python3.8/site-packages/pyee/_asyncio.py:55>
Traceback (most recent call last):
File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/home/user/python-virtual-environments/async/lib/python3.8/site-packages/pyee/_asyncio.py", line 62, in _callback
self.emit('error', exc)
File "/home/user/python-virtual-environments/async/lib/python3.8/site-packages/pyee/_base.py", line 116, in emit
self._emit_handle_potential_error(event, args[0] if args else None)
File "/home/USER/python-virtual-environments/async/lib/python3.8/site-packages/pyee/_base.py", line 86, in _emit_handle_potential_error
raise error
File "./test.py", line 11, in handle_download
download_filepath = await download.path()
File "/home/USER/python-virtual-environments/async/lib/python3.8/site-packages/playwright/async_api/_generated.py", line 5640, in path
return mapping.from_maybe_impl(await self._impl_obj.path())
File "/home/USER/python-virtual-environments/async/lib/python3.8/site-packages/playwright/_impl/_download.py", line 58, in path
return await self._artifact.path_after_finished()
File "/home/USER/python-virtual-environments/async/lib/python3.8/site-packages/playwright/_impl/_artifact.py", line 36, in path_after_finished
return pathlib.Path(await self._channel.send("pathAfterFinished"))
File "/usr/lib/python3.8/pathlib.py", line 1042, in new
self = cls._from_parts(args, init=False)
File "/usr/lib/python3.8/pathlib.py", line 683, in _from_parts
drv, root, parts = self._parse_args(args)
File "/usr/lib/python3.8/pathlib.py", line 667, in _parse_args
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
The trace seems to indicate that download.path() times out in url1, which perhaps is why the smaller PDF in url2 works? However, I do not know how to handle those timeouts (I am passing a timeout of zero for goto).
The report is for Firefox, but using Chromium has a similar exception (it throws an additional exception in goto, but that seems to be expected Chromium behavior according to microsoft/playwright-java#863 and the download still starts if that first exception is caught). Webkit throws a different exception (Frame load interrupted) in both URLs and the download event is not fired.
To give a little bit of context, in my scenario I am given URLs which may point to HTML page or PDF and I need to download both. I cannot use 'async with page.expect_download()' since the URL may directly point to a PDF file.
Thanks for your time. Let me know if you need further info