Skip to content

Conversation

cmaloney
Copy link
Contributor

@cmaloney cmaloney commented Jan 29, 2025

Both _io and _pyio now use a pre-allocated buffer of length bufsize, fill it using a os.readinto / _Py_read, and have matching "expand buffer" logic.

On my machine (Linux, Debug build) this takes: ./python -m test -M8g -uall test_largefile -m test_large_read -v from ~3.7 seconds to ~3.3 seconds

_pyio still uses 2x the memory, there are two remaining copies

  1. the bytes(result) currently copies. I'd like to either just rely on "duck typing" / bytearray is close enough to bytes, or would need to do something similar to C++ "move" semantics where the bytes could take ownership of the data buffer from the bytearray without copying... Not sure what is most Pythonic
  2. _pyio.BufferedIO._read_unlocked in the read-all case where no buffer has been allocated always does return buf[:pos] + chunk which causes another copy. Patch for that case / "if buf is length 0, just return" coming shortly.

Both now use a pre-allocated buffer of length `bufsize`, fill it using
a readinto, and have matching "expand buffer" logic.

On my machine this takes:

`./python -m test -M8g -uall test_largefile -m test_large_read -v`
from ~3.7 seconds to ~3.4 seconds
@vstinner vstinner enabled auto-merge (squash) January 30, 2025 10:50
@vstinner
Copy link
Member

LGTM. I enabled auto-merge.

@vstinner vstinner merged commit f927204 into python:main Jan 30, 2025
39 checks passed
@cmaloney cmaloney deleted the cmalnoey/pyio_fileio_readall_2 branch January 30, 2025 21:57
cmaloney added a commit to cmaloney/cpython that referenced this pull request Feb 2, 2025
vstinner pushed a commit that referenced this pull request Feb 2, 2025
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Feb 7, 2025
Both now use a pre-allocated buffer of length `bufsize`, fill it using
a readinto(), and have matching "expand buffer" logic.

On my machine this takes:

`./python -m test -M8g -uall test_largefile -m test_large_read -v`

from ~3.7 seconds to ~3.4 seconds.
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants