Closed
Description
The default implementation of read_to_end
grows its buffer by at most DEFAULT_BUF_SIZE
bytes (8KB) at a time. When the input is large, this takes O(n²) time due to repeated reallocation and copying. Calling read_to_end
on a 100MB file with an empty buffer will resize and copy the buffer over 10,000 times.
When the input size is known ahead of time, callers can avoid this problem by passing in a pre-allocated buffer. It would be nice if this was not necessary in cases where the standard library could do this automatically.
read_to_end
could be specialized to pre-reserve space for types that implementSeek
.- Types like
File
that have other ways to determine the input size could provide their own implementations ofread_to_end
.
This still leaves potentially quadratic behavior for cases where the input size can't be determined up front. This could be fixed using exponential growth, at the cost of wasting more memory due to unused buffer space.
Activity
mbrubeck commentedon Nov 7, 2017
Sorry, the description here is incorrect. Since the
read_to_end
implementation callsreserve
which grows exponentilally, the time is not quadratic. There are performance issues with doing even a small number of reallocations and copies of very large buffers, but I'll close this and file that as a new issue to avoid confusion.SimonSapin commentedon Nov 11, 2017
I measured
File::read_to_end
with a vector created withVec::with_capacity(file.metadata().len() as usize)
, compared toVec::new()
. On my linux desktop with an SSD (though everything is probably in filesystem cache here), pre-allocating + reading take 3% more time to 43% less time depending on file size.SimonSapin commentedon Nov 11, 2017
PR with a prototype: #45928