Skip to content

Xarray equivalent of np.place or df.map(mapping)? #2568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ahuang11 opened this issue Nov 24, 2018 · 11 comments
Closed

Xarray equivalent of np.place or df.map(mapping)? #2568

ahuang11 opened this issue Nov 24, 2018 · 11 comments
Labels

Comments

@ahuang11
Copy link
Contributor

ahuang11 commented Nov 24, 2018

# numpy version
x = np.array([0, 1])
np.place(x, x == 0, 1)

# pandas version
pd.Series([0, 1]).map({0: 1, 1: 1})

# current workaround
ds = xr.Dataset({'test': [0, 1]})
np.place(ds['test'].values, ds['test'].values == 0, 1)

Problem description

Is there a built in method to map values like 0 to 1?

Expected Output

returns [1, 1]

@shoyer
Copy link
Member

shoyer commented Nov 24, 2018

The usual way to do this in xarray would be to use where(), e.g., xarray.where(ds == 0, 1, ds) or ds.where(ds == 0, 1).

@ahuang11
Copy link
Contributor Author

I guess I'm thinking about more complex cases such as changing 0 -> 50, 1 -> 29, 2 -> 10

ds = xr.Dataset({'test': [0, 1, 2]})
ds.where((ds != 1) & (ds != 2), 50)

Thoughts on simplifying this?

@shoyer
Copy link
Member

shoyer commented Nov 24, 2018

I would divide this into two steps: (1) write a function that does this on NumPy arrays and (2) apply it to xarray objects using apply_ufunc, e.g.,

import numpy as np
import xarray as xr

def remap(array, mapping):
    return np.array([mapping[k] for k in array.ravel()]).reshape(array.shape)

ds = xr.Dataset({'test': ('t', [0, 1, 2])})
xr.apply_ufunc(remap, ds, kwargs=dict(mapping={0: 50, 1: 29, 2: 10}))

outputs:

<xarray.Dataset>
Dimensions:  (t: 3)
Dimensions without coordinates: t
Data variables:
    test     (t) int64 50 29 10

@ahuang11
Copy link
Contributor Author

ahuang11 commented Nov 24, 2018

Thanks for the quick replies! Is there interest in making this a built-in function? If so, I can help contribute a PR.

Also wondering about a way to wrap logic to that mapping.

Like below 0, replace with -1, between 0 and 10, replace with 5, and above 10, replace with 15 which is possible with three np.place statements I think, but have to think in backwards logic with ds.where().

@shoyer
Copy link
Member

shoyer commented Nov 25, 2018

I would lean slightly against adding a dedicated method for this (but could be convinced if others are interested). Usually we copy pandas or numpy APIs, but DataFrame.map is a not a great name -- map means too many other things (e.g., consider the buildin map).

It might make sense to copy the design of numpy.select instead: https://docs.scipy.org/doc/numpy/reference/generated/numpy.select.html

e.g., you could write something like xarray.select([ds < 0, ds < 10], [5, 10], default=15)

@max-sixty
Copy link
Collaborator

Agree that map is not a good name (and I find the pandas API difficult in this area - each time I'm looking up what relabel / rename / map / select / filter does)

How about match / switch / case?

We would definitely use this. I agree it'd probably be used less in xarray than in pandas; though I'm keen to expand the API, in a deliberate and careful way, to some of the traditional pandas use-cases (but a small vote among many)

@ewineteer
Copy link

Has any progress been made on adding a builtin function for this? Thanks.

@ahuang11
Copy link
Contributor Author

No, not from me at least.

@ahuang11
Copy link
Contributor Author

ahuang11 commented Aug 9, 2020

If I were to make a PR, where would this method reside? Would it be under dataset.py and dataarray.py? Also, would I simply call np.select inside the method, and if so, how would I add support for dask?

My minimal example atm:

import xarray as xr
import numpy as np
import hvplot.xarray

ds = xr.tutorial.open_dataset('air_temperature').isel(time=0)

ds['air_cats'] = (
    ('lat', 'lon'),
    np.select([ds['air'].values >= 273.15, ds['air'].values < 273.15], ['above freezing', 'below freezing'])
)
ds.hvplot('lon', 'lat', hover_cols=['air_cats'])

image

@stale
Copy link

stale bot commented Apr 17, 2022

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Apr 17, 2022
@dcherian
Copy link
Contributor

There's a longer discussion in #6377 so let's close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants