Description
How do you use Sentry?
Sentry Saas (sentry.io)
Version
1.39.2
Issue
The strip_string
function isn't working properly.
Here we calculate the size of the string in bytes as length
. But then when we actually determine that the string needs trimming, we trim length
characters from the string instead of length
bytes. We also then potentially report the wrong number in the metadata.
from sentry_sdk.utils import strip_string
strip_string("éê", 2) # == AnnotatedValue(value="éê", ...)
Both é
and ê
are two-byte large, making the string "éê"
4 bytes long. Yet strip_string
will not strip it to two bytes.
- It'll get encoded into bytes here.
- The size of the encoded version is 4, so
length
will be set to4
. - This check will be
True
, because4 > 2
. - But when we actually try to trim here, we're trimming the string
"éê"
to two (characters/code points), as opposed to the encoded bytes representation.
Solution
Probably something to the effect of
string.encode("utf-8")[: max_bytes - 3].decode("utf-8", errors="ignore")
The [: max_bytes - 3]
part might end up cutting a code point in two; .decode
with errors="ignore"
will ignore any malformed codepoints.
Metadata
Metadata
Assignees
Labels
No labels