Description
The Encoding::decode_*
methods need in some cases to allocate a String
, and decide how much capacity to give it. Other than *_without_replacement
(2984a8b#commitcomment-20990260), this is based on Encoding::max_utf8_buffer_length
which assumes the worst case. For many encodings, that’s when every byte of the input is an error that emits a three-byte U+FFFD code point.
In short, as soon as there’s an error, these method allocate three times the size of the (remaining) input. Assuming the worst case simplifies the code which only needs to allocate once, but it seems excessive that a single bit flip near the beginning of the input could triple memory usage.
So a more adaptive allocation scheme might be desirable, but admittedly there is no obvious answer as to what it should be.