Skip to content

Allocating three times the size of the input seems excessive. #6

Closed
@SimonSapin

Description

@SimonSapin

The Encoding::decode_* methods need in some cases to allocate a String, and decide how much capacity to give it. Other than *_without_replacement (2984a8b#commitcomment-20990260), this is based on Encoding::max_utf8_buffer_length which assumes the worst case. For many encodings, that’s when every byte of the input is an error that emits a three-byte U+FFFD code point.

In short, as soon as there’s an error, these method allocate three times the size of the (remaining) input. Assuming the worst case simplifies the code which only needs to allocate once, but it seems excessive that a single bit flip near the beginning of the input could triple memory usage.

So a more adaptive allocation scheme might be desirable, but admittedly there is no obvious answer as to what it should be.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions