Skip to content

ENH: add default value to str.extract #38001

Open
@erfannariman

Description

@erfannariman

Is your feature request related to a problem?

In some cases we can set a default value for non matches of str.extract with Series/Frame.fillna.

But there are cases when the data prior to applying str.extract already has NaN values. So running fillna on the results would fill both, and there we cannot distinguish between what the actual NaN values were and what the NaN values are as a result of str.extract.

Describe the solution you'd like

Set a default value which has to be a string, to indicate the non matches of the regex pattern.

API breaking implications

None I think

from pandas import DataFrame
import numpy as np

df = DataFrame({'A': ['a84', 'abcd', '99string', np.nan]})
result = df['A'].str.extract(r'(\d+)', expand=False, default='missing')
print(df, '\n')
print(result)

          A
0       a84
1      abcd
2  99string
3       NaN 

0         84
1    missing            # <--- the value which did not match          
2         99
3        NaN             # <--- the NaN already present in the data prior str.extract
Name: A, dtype: object

Metadata

Metadata

Assignees

Labels

API - ConsistencyInternal Consistency of API/BehaviorEnhancementNeeds DiscussionRequires discussion from core team before further actionStringsString extension data type and string data

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions