Skip to content

BUG: Series.unique() terminates strings prematurely on null Bytes #53720

@Nadrons

Description

@Nadrons

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

outp = pd.Series(['A\x00B', 'A\x00C']).unique()

print(outp)

# prints:
# ['A\x00B']

Issue Description

Series.unique() fails to detect unique strings when null bytes are included.

As per this question and this issue, it seems that this is another case of strings being passed to a Cython function and terminating early on null bytes.

Expected Behavior

Should return ['A\x00B' 'A\x00C']

Installed Versions

pandas : 2.0.2

Metadata

Metadata

Assignees

Labels

AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugStringsString extension data type and string data

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions