Skip to content

strip_string trims by number of chars instead of bytes #2634

Closed
@sentrivana

Description

@sentrivana

How do you use Sentry?

Sentry Saas (sentry.io)

Version

1.39.2

Issue

The strip_string function isn't working properly.

Here we calculate the size of the string in bytes as length. But then when we actually determine that the string needs trimming, we trim length characters from the string instead of length bytes. We also then potentially report the wrong number in the metadata.

from sentry_sdk.utils import strip_string

strip_string("éê", 2)  # == AnnotatedValue(value="éê", ...)

Both é and ê are two-byte large, making the string "éê" 4 bytes long. Yet strip_string will not strip it to two bytes.

  1. It'll get encoded into bytes here.
  2. The size of the encoded version is 4, so length will be set to 4.
  3. This check will be True, because 4 > 2.
  4. But when we actually try to trim here, we're trimming the string "éê" to two (characters/code points), as opposed to the encoded bytes representation.

Solution

Probably something to the effect of

string.encode("utf-8")[: max_bytes - 3].decode("utf-8", errors="ignore")

The [: max_bytes - 3] part might end up cutting a code point in two; .decode with errors="ignore" will ignore any malformed codepoints.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions