-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
"import pandas" changes NumPy error-handling settings #13109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This came up somewhat recently in #12464, but I don't think anything came out of it. |
@mdickinson this is very very common to do. The entire point of It IS possible to turn this off by having practially every deferral option to numpy wrap this in a context manager (rather than setting the global). It would'nt be that invasive (could do it with a decorator), but adds complexity, may be somewhat non-performant, and its a fair amount of work. If you want to give this a stab, then i'll reopen. |
Sure, and that's fine when you're using Pandas to do data analysis at a command line. It's a long way from fine when you're using Pandas as a library for a couple of small tasks (in this particular case, flexible reading from .csv files) within a larger application. Then you're in the situation where one of your application dependencies has unilaterally and silently turned off NumPy warnings for the entire application, oblivious to the needs of that application or any of its many other libraries. That's just rude. :-) Deciding to turn off NumPy warnings (or any warnings, for that matter) globally is a decision that should be made at application level, not at the level of one particular library. It's interesting to compare with the |
well this setting has existed since pandas inception. as I layed out above I don't see much gain in changing this. If you would like to make an attempt by all means.
your statement is very odd. Virtually the entire user base uses pandas for data analysis. Want to drop back to some exception handling in numpy w/o using pandas seems dubious at best. |
I agree that this behavior is very annoying. Most of the time, my only use of pandas is to have a (great) implementation of groupby, and do not want other errors to pass silently. Likewise, because seaborn depends on pandas, it is easy to end up with this change even without explicitly importing pandas. Two possible solutions may be
|
I do not believe that unilaterally modifying numpy's error-handling is a common practice. If you proposed that over at Part of the reason that you have seen only a few complaints about this behavior since pandas' inception is that it changes behavior silently. It took a couple of years after the same behavior's inception in Having an application that has both data analysis (where NaN may mean missing data) and moderately-complicated numerical algorithms (where NaNs can appear from perfectly-clean inputs due to domain violations and other numerical errors) is not at all dubious. Most nontrivial data analysis tasks have both. Cleaning missing data first before doing the heavy numerical work of data analysis/statistics/machine learning is quite common. That's why numpy has the ability to control the FPE-handling at a pretty fine grain. |
well @rkern this has been in place from < 2012 AFAICT. This IS possible, but performance of the context managers would need to be considered, and much more importantly someone to do it; are you volunteering? |
In progress. In so doing, I found another reason to avoid the fire-and-forget
|
ok thanks @rkern that would be awesome! yeah I have seen that inverse case as well. |
@jreback how do we envision this interacting with the internals refactor? This is kind of a gray-zone on the defined behavior right now, I don't think we would want to commit to anything that could change with the refactor. s = pd.Series([1, 2, 3])
with np.errstate(invalid='raise'): s.reindex(range(4)) > 1 That casts to float and uses the |
no @TomAugspurger I don't think it will have much of an effect. IF its done, then computation that is deferred to numpy will use a context manager to wrap the err state. That's it, as it is one now. Ideally one would have a function like:
something like that. There would have to be some logic in this function to handle some of this but its not a big deal. It will mostly be isolated. |
Just hit this problem myself, I'm definitely in favor of a fix. In this case, Pandas was a dependency of another package I was using---and indeed I wasn't even using the Pandas-dependent functionality---but I was bitten by this anyway. Thanks to @rkern for working on this. |
This is great. @rkern thank you for the hard work! |
A simple
import pandas
apparently does anp.seterr(all='ignore')
: https://github.com/pydata/pandas/blob/23eb483d17ce41c0fe41b0bfa72c90df82151598/pandas/compat/numpy/__init__.py#L8-L9This is a problem when using Pandas as a library, particularly during testing: a test (completely unrelated to Pandas) that should have produced warnings can fail to do so just because some earlier test happened to import a library that imported a library that imported something from Pandas (for example). Or a test runner that's done a
np.seterr(all='raise')
specifically to catch potential issues can end up catching nothing, because some Pandas import part-way through the test run turned the error handling off again.I'm working around this right now by wrapping every pandas import in
with np.errstate():
. For example:But that's (a) ugly, and (b) awkward if the pandas import is in a third-party library out of your control.
Please consider removing this feature!
Code Sample, a copy-pastable example if possible
Expected Output
I expected to see a
RuntimeWarning
from the division by zero above, as in:output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: