-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Flox based groupby operations don't support dtype
in mean method
#6902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flox based groupby operations don't support dtype
in mean method
#6902
Comments
Yeah I think we need to fix this in flox. Can you come up with a simple test case that checks that the accumulation is done properly? |
It's not crashing for me, but the dtype is not the same when switching flox on/off: ds = xr.tutorial.load_dataset("air_temperature")
assert ds.air.dtype == np.float32
for use_flox in (False, True):
with xr.set_options(use_flox=use_flox):
ds_mean = ds.groupby("time.month").mean(dtype="float64").compute()
actual = ds_mean.air.dtype
expected = np.float64
print(f"{use_flox=}, {actual=}, {expected=}")
assert actual == expected
# use_flox=False, actual=dtype('float64'), expected=<class 'numpy.float64'>
# use_flox=True, actual=dtype('float32'), expected=<class 'numpy.float64'>
INSTALLED VERSIONScommit: None xarray: 0.16.3.dev99+gc19467fb |
Added a synthetic test case for various configurations in xarray-contrib/flox#131 |
Discussed in #6901
Originally posted by tasansal August 9, 2022
We have been using the new groupby logic with Flox and numpy_groupies; however, when we run the following, the dtype is not recognized as a valid argument.
This breaks API compatibility for cases where you may not have the acceleration libraries installed.
Not sure if this has to be upstream in
In addition to base Xarray we have the following extras installed:
Flox
numpy_groupies
Bottleneck
We do this because our data is
float32
but we want the accumulator in mean to befloat64
for accuracy.One solution is to cast the variable to float64 before mean, which may cause a copy and spike in memory usage.
When Flox and numpy_groupies are not installed, it works as expected.
We are working with multi-dimensional time-series of weather forecast models.
Here is the end of the traceback and it appears it is on Flox.
What is the best way to handle this, maybe fix it in Flox?
The text was updated successfully, but these errors were encountered: