Skip to content

BUG: Fix Series.reindex losing values when reindexing to MultiIndex #61969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Roline-Stapny
Copy link

@Roline-Stapny Roline-Stapny commented Jul 26, 2025

##Series.reindex()

#Before

# Create a Series with a named Index
series = pd.Series([26.73, 24.255], index=pd.Index([81, 82], name='a'))

# Create a MultiIndex with level names 'a', 'b', 'c'
target = pd.MultiIndex.from_product(
    [[81, 82], [np.nan], ["2018-06-01", "2018-07-01"]], 
    names=["a", "b", "c"]
)

# This would incorrectly set all values to NaN
series.reindex(target)
# a   b    c         
# 81  NaN  2018-06-01   NaN
#          2018-07-01   NaN
# 82  NaN  2018-06-01   NaN
#          2018-07-01   NaN

# But this works correctly
series.reindex(target, level="a")
# a   b    c         
# 81  NaN  2018-06-01    26.73
#          2018-07-01    26.73
# 82  NaN  2018-06-01    24.255
#          2018-07-01    24.255

#After

# Same setup as before
series = pd.Series([26.73, 24.255], index=pd.Index([81, 82], name='a'))
target = pd.MultiIndex.from_product(
    [[81, 82], [np.nan], ["2018-06-01", "2018-07-01"]], 
    names=["a", "b", "c"]
)

# Now both produce the same correct result
series.reindex(target)  # Automatically detects level='a'
# a   b    c         
# 81  NaN  2018-06-01    26.73
#          2018-07-01    26.73
# 82  NaN  2018-06-01    24.255
#          2018-07-01    24.255

##Datafram.reindex()

df = pd.DataFrame({
    'value': [26.73, 24.255],
    'other': ['A', 'B']
}, index=pd.Index([81, 82], name='a'))

target = pd.MultiIndex.from_product(
    [[81, 82], [np.nan], ["2018-06-01", "2018-07-01"]], 
    names=["a", "b", "c"]
)

Before

df.reindex(index = target)
                   value other
a  b   c
81 NaN 2018-06-01    NaN   NaN
       2018-07-01    NaN   NaN
82 NaN 2018-06-01    NaN   NaN
       2018-07-01    NaN   NaN

After

df.reindex(index = target)
                    value other
a  b   c
81 NaN 2018-06-01  26.730     A
       2018-07-01  26.730     A
82 NaN 2018-06-01  24.255     B
       2018-07-01  24.255     B

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does DataFrame.reindex also need the same handling?

@mroeschke mroeschke added MultiIndex Index Related to the Index class or subclasses labels Jul 28, 2025
@Roline-Stapny
Copy link
Author

Roline-Stapny commented Jul 29, 2025

  • reviewers

Does DataFrame.reindex also need the same handling?

Yes, Dataframe with single index is having the same issue

df = pd.DataFrame({
    'value': [26.73, 24.255],
    'other': ['A', 'B']
}, index=pd.Index([81, 82], name='a'))

# Create a MultiIndex with level names 'a', 'b', 'c'
target = pd.MultiIndex.from_product(
    [[81, 82], [np.nan], ["2018-06-01", "2018-07-01"]], 
    names=["a", "b", "c"]
)



df.reindex(target)
                   value other
a  b   c
81 NaN 2018-06-01    NaN   NaN
       2018-07-01    NaN   NaN
82 NaN 2018-06-01    NaN   NaN
       2018-07-01    NaN   NaN

df.reindex(target, level="a")
                    value other
a  b   c
81 NaN 2018-06-01  26.730     A
       2018-07-01  26.730     A
82 NaN 2018-06-01  24.255     B
       2018-07-01  24.255     B

How its the same scenario for multiindex, reindex only works if all index are matching. Infact specifying level for multiIndex dataframe is raising TypeError

raise TypeError("Join on level between two MultiIndex objects is ambiguous")
TypeError: Join on level between two MultiIndex objects is ambiguous
    source_idx = pd.MultiIndex.from_product(
        [[81, 82], ["2018-06-01"]],
        names=["a", "c"]
    )
    df = pd.DataFrame(
        {"value": [26.73, 24.255]},
        index=source_idx
    )

    # Create target with same level names but different structure
    target_idx = pd.MultiIndex.from_product(
        [[81, 82], [np.nan], ["2018-06-01", "2018-07-01"]],
        names=["a", "b", "c"]
    )

    
>>> df.reindex(target_idx)  # Reindexing doesnt copy matching index values
                   value
a  b   c
81 NaN 2018-06-01    NaN
       2018-07-01    NaN
82 NaN 2018-06-01    NaN
       2018-07-01    NaN

Reindex MultiIndex dataframe works iff all indexes match.

I will leave the multiIndex dataframe functionality as is and address the issue in single index dataframe like the example above. lmk what you think.

@Roline-Stapny Roline-Stapny requested a review from mroeschke July 29, 2025 16:20
@Roline-Stapny Roline-Stapny requested a review from mroeschke July 30, 2025 14:35
@Roline-Stapny
Copy link
Author

@mroeschke could you please review it when you get a chance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants