Description
-
I have checked that this issue has not already been reported (might be another variant of Correlation inconsistencies between Series and DataFrame #20954).
-
I have confirmed this bug exists on the latest version of pandas (
1.1.3
). -
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
for length in [2, 3, 5, 10, 20]:
print(pd.DataFrame(length*[[0.42, 0.1]], columns=["A", "B"]).corr())
gives
A B
A NaN NaN
B NaN NaN
A B
A NaN NaN
B NaN 1.0
A B
A 1.0 NaN
B NaN NaN
A B
A 1.0 -1.0
B -1.0 1.0
A B
A 1.0 1.0
B 1.0 1.0
Problem description
Inconsistent output with slightly varying number of rows. Would expect correlation between series where at least one of them is constant, to be NaN
.
This makes e.g. code dependent on dropna()
usage after calculating corr()
difficult/error prone, as behaviour is inconsistent.
Expected Output
Either consistent NaN
output when calculating correlation with constant data, or a warning in pandas.DataFrame.corr
documentation stating that returned correlation between constant series can be anything from [1.0, -1.0, NaN]
.