-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
_libs.hashtable.ismember incorrect with tuples containing nan #41836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That is true. The reason is that
is We overwrite default pandas/pandas/_libs/src/klib/khash_python.h Lines 172 to 177 in 059c8ba
To be consistent, we should do the same for other built-in types (complex, tuple, frozenset.... are there more?). I would not expect too much impact on performace. I don't see an easy way to ensure the correct behavior for non-built-in classes like
and would say it is responsibility of the class' authors to ensure the desired behavior.
|
@jbrockmendel I'have looked into this problem. The fix is relatively simple for complex and tuple (here is an prototype: https://github.com/realead/pandas/tree/fix_deep_nans). However, the situation is quite different for
but there are so many implemenation details in this function, that it would be really unwise to duplicate the code. A sound fix would be to introduce a So my proposal would be to add special handling of complex and tuple objects and to leave the situation with |
Thanks for looking into this. The only case that is a showstopper for me ATM is the tuple, so i think punting on frozenset is totally reasonable. |
Because of python/cpython@a07da09 we need to change the hash-function as well, otherwise different nans will have different hashes for Py3.10 and later. |
It looks like the equality check being done on tuples is checking NA values for identity, so separately instantiated
float("nan")
objects aren't considered matching.cc @realead
The text was updated successfully, but these errors were encountered: