Skip to content

BUG: Inconsistent correlation between constant series (varies with number of rows) #37448

Closed
@anders-kiaer

Description

@anders-kiaer

Code Sample, a copy-pastable example

import pandas as pd

for length in [2, 3, 5, 10, 20]:
    print(pd.DataFrame(length*[[0.42, 0.1]], columns=["A", "B"]).corr())

gives

    A   B
A NaN NaN
B NaN NaN
    A    B
A NaN  NaN
B NaN  1.0
     A   B
A  1.0 NaN
B  NaN NaN
     A    B
A  1.0 -1.0
B -1.0  1.0
     A    B
A  1.0  1.0
B  1.0  1.0

Problem description

Inconsistent output with slightly varying number of rows. Would expect correlation between series where at least one of them is constant, to be NaN.

This makes e.g. code dependent on dropna() usage after calculating corr() difficult/error prone, as behaviour is inconsistent.

Expected Output

Either consistent NaN output when calculating correlation with constant data, or a warning in pandas.DataFrame.corr documentation stating that returned correlation between constant series can be anything from [1.0, -1.0, NaN].

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNumeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions