"wrong" covariance matrix returned in the presence of nans

See the mailing list "[pydata] Covariance matrix not positive semi-definite."

Currently, a covariance matrix is computed using pairwise available observations ie., if there is missing data at an index but not in the two pairs it still uses those pairs in the pairwise covariance matrix. The result of this computation is not a covariance matrix and can be non positive semi-definite.

What to do in this case? 1) Warn? 2) Raise an error? 3) Only use observations for which all variables are available?

3 is tempting, the resultant covariance matrix will be a true covariance matrix, but it's an inconsistent estimator of the covariance.

My vote is for 2, so that the user is forced to think what they actually want to compute. Ideally, the error message will point to estimators that are appropriate for this situation, but these are not online yet (from statsmodels).

https://github.com/statsmodels/statsmodels/pull/631
https://github.com/statsmodels/statsmodels/issues/303


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

"wrong" covariance matrix returned in the presence of nans #3513

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

"wrong" covariance matrix returned in the presence of nans #3513

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions