Open
Description
Is your feature request related to a problem?
In some cases we can set a default value for non matches of str.extract
with Series/Frame.fillna
.
But there are cases when the data prior to applying str.extract
already has NaN values. So running fillna on the results would fill both, and there we cannot distinguish between what the actual NaN values were and what the NaN values are as a result of str.extract.
Describe the solution you'd like
Set a default value which has to be a string, to indicate the non matches of the regex pattern.
API breaking implications
None I think
from pandas import DataFrame
import numpy as np
df = DataFrame({'A': ['a84', 'abcd', '99string', np.nan]})
result = df['A'].str.extract(r'(\d+)', expand=False, default='missing')
print(df, '\n')
print(result)
A
0 a84
1 abcd
2 99string
3 NaN
0 84
1 missing # <--- the value which did not match
2 99
3 NaN # <--- the NaN already present in the data prior str.extract
Name: A, dtype: object