-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Closed
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugStringsString extension data type and string dataString extension data type and string data
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
outp = pd.Series(['A\x00B', 'A\x00C']).unique()
print(outp)
# prints:
# ['A\x00B']
Issue Description
Series.unique()
fails to detect unique strings when null bytes are included.
As per this question and this issue, it seems that this is another case of strings being passed to a Cython function and terminating early on null bytes.
Expected Behavior
Should return ['A\x00B' 'A\x00C']
Installed Versions
pandas : 2.0.2
Metadata
Metadata
Assignees
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugStringsString extension data type and string dataString extension data type and string data