Skip to content

Conversation

cmaloney
Copy link
Contributor

@cmaloney cmaloney commented Jan 29, 2025

Both _io and _pyio now use a pre-allocated buffer of length bufsize, fill it using a os.readinto / _Py_read, and have matching "expand buffer" logic.

On my machine (Linux, Debug build) this takes: ./python -m test -M8g -uall test_largefile -m test_large_read -v from ~3.7 seconds to ~3.3 seconds

_pyio still uses 2x the memory, there are two remaining copies

  1. the bytes(result) currently copies. I'd like to either just rely on "duck typing" / bytearray is close enough to bytes, or would need to do something similar to C++ "move" semantics where the bytes could take ownership of the data buffer from the bytearray without copying... Not sure what is most Pythonic
  2. _pyio.BufferedIO._read_unlocked in the read-all case where no buffer has been allocated always does return buf[:pos] + chunk which causes another copy. Patch for that case / "if buf is length 0, just return" coming shortly.

Both now use a pre-allocated buffer of length `bufsize`, fill it using
a readinto, and have matching "expand buffer" logic.

On my machine this takes:

`./python -m test -M8g -uall test_largefile -m test_large_read -v`
from ~3.7 seconds to ~3.3 seconds
@cmaloney
Copy link
Contributor Author

cmaloney commented Jan 29, 2025

Not sure how to get the Tests / Ubuntu / build and test unbroken, they seem to be auto-cancelling fast... which isn't happening on any of my other open PRs.

@cmaloney cmaloney closed this Jan 29, 2025
@cmaloney
Copy link
Contributor Author

rebased on main and opened #129458 as a replacement that hopefully will revoid the cancelled test runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant