Skip to content

Unexpected behavior in DataFrame.shift(..., axis=1) with missing data #17441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mgoldwasser opened this issue Sep 5, 2017 · 3 comments
Closed
Labels
Bug Internals Related to non-user accessible pandas implementation Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Multi-Block Issues caused by the presence of multiple Blocks

Comments

@mgoldwasser
Copy link

mgoldwasser commented Sep 5, 2017

Code Sample

import numpy as np
import pandas as pd

def check_shift(t):
    print('Original:', t, sep='\n')
    a = t.shift(-5, axis=1)
    print('Incorrect result:', a, sep='\n')
    b = t.transpose().shift(-5).transpose()
    print('Correct result:', b, sep='\n')
    return(a.equals(b))

# create dummy dataframe
df = pd.DataFrame(np.random.randint(0, 1, size=(2, 10)))
# add some missing values 
df.iloc[0, 7:10] = np.NaN
# below evaluates to false, but should be true
print('Test passed:', check_shift(df))

Problem description

The shift operation with axis=1 produces unexpected results when the underlying dataframe contains missing values.

The issue is that unexpected additional missing values get inserted with the shift operation.

Expected Output

print(check_shift(df)) should return true

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 32.2.0
Cython: None
numpy: 1.13.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.10
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@mgoldwasser mgoldwasser changed the title Unexpected behavior DataFrame.shift() on axis=1 when there is missing data Unexpected behavior in DataFrame.shift(..., axis=1) with missing data Sep 5, 2017
@gfyoung gfyoung added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Bug labels Sep 5, 2017
@gfyoung
Copy link
Member

gfyoung commented Sep 5, 2017

Indeed, this does look buggy with axis=1. Feel free to investigate and put up a PR for it!

@jbrockmendel
Copy link
Member

The example in the OP has 4 blocks, likely the same underlying issue as #10539

@jbrockmendel jbrockmendel added Internals Related to non-user accessible pandas implementation Multi-Block Issues caused by the presence of multiple Blocks labels Sep 21, 2020
@jbrockmendel
Copy link
Member

closed by #35578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Internals Related to non-user accessible pandas implementation Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Multi-Block Issues caused by the presence of multiple Blocks
Projects
None yet
Development

No branches or pull requests

3 participants