Taking the first N bytes of a `str` that still make up valid UTF-8

In the past I've done stuff like `&s[..max_len.min(s.len())]` to truncate strings, but it turns out this is subtly broken (and will panic) for strings where `max_len` happens to be in the middle of a multibyte utf8-sequence (e.g. for the case `!s.is_char_boundary(max_len)`).

I've made a utility function for this (below), but it would be nice if a method on `str` existed for this case. In particular, I think the fact that the naive solution is broken on non-ASCII text makes it worthwhile, since developers are less likely to test on such text.

I have no opinions on its name (I'm genuinely terrible at names), nor on further extensions / variations or anything like that.

Anyway, below is the source for my version of it, provided mostly as to be completely clear on what I'm talking about. I think in practice this would go as a method on `str` and so would have a somewhat different implementation.

```rust
pub fn slice_up_to(s: &str, max_len: usize) -> &str {
    if max_len >= s.len() {
        return s;
    }
    let mut idx = max_len;
    while !s.is_char_boundary(idx) {
        idx -= 1;
    }
    &s[..idx]
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Taking the first N bytes of a `str` that still make up valid UTF-8 #2566

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Taking the first N bytes of a str that still make up valid UTF-8 #2566

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Taking the first N bytes of a `str` that still make up valid UTF-8 #2566