Skip to content

Taking the first N bytes of a str that still make up valid UTF-8 #2566

Open
@thomcc

Description

@thomcc

In the past I've done stuff like &s[..max_len.min(s.len())] to truncate strings, but it turns out this is subtly broken (and will panic) for strings where max_len happens to be in the middle of a multibyte utf8-sequence (e.g. for the case !s.is_char_boundary(max_len)).

I've made a utility function for this (below), but it would be nice if a method on str existed for this case. In particular, I think the fact that the naive solution is broken on non-ASCII text makes it worthwhile, since developers are less likely to test on such text.

I have no opinions on its name (I'm genuinely terrible at names), nor on further extensions / variations or anything like that.

Anyway, below is the source for my version of it, provided mostly as to be completely clear on what I'm talking about. I think in practice this would go as a method on str and so would have a somewhat different implementation.

pub fn slice_up_to(s: &str, max_len: usize) -> &str {
    if max_len >= s.len() {
        return s;
    }
    let mut idx = max_len;
    while !s.is_char_boundary(idx) {
        idx -= 1;
    }
    &s[..idx]
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-libs-apiRelevant to the library API team, which will review and decide on the RFC.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions