`strip_string` trims by number of chars instead of bytes

### How do you use Sentry?

Sentry Saas (sentry.io)

### Version

1.39.2

### Issue

The `strip_string` function isn't working properly.

[Here](https://github.com/getsentry/sentry-python/blob/61a4621336fd81b69e5259af599e5779d78dce3f/sentry_sdk/utils.py#L1129) we calculate the size of the string in bytes as `length`. But then when we actually determine that the string needs trimming, we trim `length` **characters** from the string instead of `length` **bytes**. We also then potentially report the wrong number in the [metadata](https://github.com/getsentry/sentry-python/blob/61a4621336fd81b69e5259af599e5779d78dce3f/sentry_sdk/utils.py#L1136).

```python
from sentry_sdk.utils import strip_string

strip_string("éê", 2)  # == AnnotatedValue(value="éê", ...)
```
Both `é` and `ê` are two-byte large, making the string `"éê"` 4 bytes long. Yet `strip_string` will not strip it to two bytes.

1. It'll get encoded into bytes [here](https://github.com/getsentry/sentry-python/blob/61a4621336fd81b69e5259af599e5779d78dce3f/sentry_sdk/utils.py#L1129).
2. The size of the encoded version is 4, so `length` will be set to `4`.
3. [This check](https://github.com/getsentry/sentry-python/blob/61a4621336fd81b69e5259af599e5779d78dce3f/sentry_sdk/utils.py#L1131) will be `True`, because `4 > 2`.
4. But when we actually try to trim [here](https://github.com/getsentry/sentry-python/blob/61a4621336fd81b69e5259af599e5779d78dce3f/sentry_sdk/utils.py#L1133), we're trimming the string `"éê"` to two (characters/code points), as opposed to the encoded bytes representation.

## Solution

Probably something to the effect of

```python
string.encode("utf-8")[: max_bytes - 3].decode("utf-8", errors="ignore")
```

The `[: max_bytes - 3]` part might end up cutting a code point in two; `.decode` with `errors="ignore"` will ignore any malformed codepoints.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`strip_string` trims by number of chars instead of bytes #2634

How do you use Sentry?

Version

Issue

Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

strip_string trims by number of chars instead of bytes #2634

Description

How do you use Sentry?

Version

Issue

Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`strip_string` trims by number of chars instead of bytes #2634