-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
replace method does't work with string type Series #31644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
looks buggy investigation is welcome |
>>>pd.Series(['A','B']).astype('string').replace('.','C',regex=True)
0 A
1 B
dtype: string Thanks for answer ! |
you mean something like this >>> pd.Series(['A','B']).astype('string').replace('A', pd.NA)
0 <NA>
1 B
dtype: string |
Oh? I get an error. >>>pd.Series(['A','B']).astype('string').replace('A', pd.NA)
IndexError: arrays used as indices must be of integer (or boolean) type |
Are you running it on master branch? |
I have already updated to the latest version 1.0.0. |
yeah, there are some fixes after 1.0.0, but not released yet, so some new fixes can only be tested on master. Please let me know if you still have this issue on pandas master branch |
Oh, I haven't do that. |
It works for the pd.NA issue on master branch, but still not work for the original issue. @charlesdong1991 |
thanks for confirming the issue on master, are you interested in investigating it? @GYHHAHA |
Sorry, I'm not sophisticated on the Pandas source code, but I will pay close attention on that when the next version releases. >>>pd.Series(['A',np.nan],dtype='O').replace('A','B')
0 B
1 NaN
dtype: object
>>>pd.Series(['A',np.nan],dtype='string').replace('A','B')
AssertionError: B The error seems not very clear. |
thanks for the report, your finding is very helpful!! @GYHHAHA i will look into it a bit |
take |
I guess a rough reason for that is, not like the np.nan, the pd.NA doesn't stand for a constant value, so when launch a match for 'A', it's not clear whether pd.NA equals 'A', so the error is raised. |
Is it possible to take pd.NA as a legal choice for the 'repl' parameter of str.replace method in the latter version? It seems to be more natural. |
Looks to work on master now. Could use a test
|
take |
I think there is another bug: >>> pd.Series(["a", pd.NA, "a"]).astype("string").replace(["a"], "b", regex=True) #strange behaviour
0 b
1 <NA>
2 a
dtype: string
>>> pd.Series(["a", pd.NA, "a"]).astype("string").replace("a", "b", regex=True) #replace works fine
0 b
1 <NA>
2 b
dtype: string Problem description |
Code Sample, a copy-pastable example if possible
Problem description
It seems that replace doesn't work with the string type Series.
Why these two codes return different results?
The text was updated successfully, but these errors were encountered: