Skip to content

interpolate/sample array at point #191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cossatot opened this issue Jul 27, 2014 · 14 comments
Closed

interpolate/sample array at point #191

cossatot opened this issue Jul 27, 2014 · 14 comments

Comments

@cossatot
Copy link

This looks like an excellent project, thanks for all the work.

I am wondering if there is functionality for interpolation or sampling of a Dataset at a point (or sequence of points). The SciPy way of doing this uses scipy.ndimage.map_coordinates and is completely backwards for (geo)spatial data, requiring the point's coordinates to be converted to 'array' coordinates before interpolation.

Does this already have an implementation, given how super common an operation it is? If not, would it be difficult to wrap pandas.Series.interpolate() up? I might write a function to do this, but I don't known the xray internals well enough to know where to stick it.

Thanks,
Richard

@shoyer shoyer added API and removed pandas-like labels Jul 28, 2014
@shoyer
Copy link
Member

shoyer commented Jul 28, 2014

Hi Richard,

Your question is actually very timely. We don't have any routines yet to do interpolation, although one of my colleagues (not sure if he's on GitHub) was looking into 1-dimensional interpolation last week.

A contribution to add interpolation to xray would certainly be very welcome!

I would not recommend wrapping the pandas routines -- they just use their own wrapper over scipy and only interpolate in 1D. They also have somewhat unusual API -- they support interpolation only to fill in missing values (marked with NaN).

Scipy has a wide variety of interpolation options, most of which I have not used. I'm guessing you're most immediately interested in doing something with map_coordinates? What were you thinking for the function signature?

As for where to put it, we have two options:

  1. Add a method DataArray.interpolate.
  2. If we don't think we can come up with a generic enough API to hold all interpolation strategies in a single method (quite likely!), add a new module xray.interpolation.

@cossatot
Copy link
Author

Stephan,

I think that I could contribute some functions to do 'nearest' and linear interpolation in n-dimensions; these should be able to take advantage of the indexing afforded by xray and not have to deal with Scipy's functions, because they only require selecting values in the neighborhood of the points.

As far as I can tell, higher-order interpolation (spline, etc.) requires fitting functions to the entirety of the dataset, which is pretty slow/ram-intensive with large datasets, and many of the fuctions require the data to be on a regular grid (I am not sure what the xray / netcdf requirements are here). For this, it's probably better to use the scipy functions (at least I personally don't have the knowledge to write anything comparable without more study).

For the function signature, I was thinking about something simple, like:

xray.interpolate(point[s], array, field[s], order), or
DataArray.interpolate(point[s], order) where points are the points to interpolate field values from the array, and order is like in map_coordinates (0=nearest, 1=linear, 3=cubic, etc.).

This could return a Series or DataFrame.

But thinking about this a little more, there are kind of two sides to interpolation: What I think of as 'sampling', where we pull values at points from within a grid or structured array (like in map_coordinates), and then the creation of arrays from unstructured (or just smaller) point sets. Both are equally common in geoscience, and probably require pretty different treatment programmatically, so it might be wise to make room for both early.

@shoyer
Copy link
Member

shoyer commented Jul 29, 2014

So I would definitely still wrap the scipy functions for any interpolation routines for xray. It's not worth rewriting any of them twice! Scipy is an optional dependency for xray, but it would be fine to require it for the interpolation routines.

Yes, there do seems to be a few different interpolation routines in scipy. Naming the function you're thinking of something like xray.interpolation.map_coordinates would leave plenty of room for alternative interpolation methods.

In xray, we have DataArray objects (corresponding to a particular variable in a netCDF file along with its coordinates) as well as Datasets (corresponding to entire netCDFs). I think it makes a bit more sense to interpolate individual variables, so I guess you would want this function to have a signature something like the following:

Parameters
------------
source : DataArray
    Gridded source data.
destination : ndarray or dict-like (e.g., xray.Dataset or pandas.DataFrame)
    Array of points of points to sample at (n_obs * n_dim)
    or mapping from dimension names to coordinate values.
order : int, optional
mode : {'constant', 'nearest'}, optional
cval : scalar, optional
   Maybe defaulting to NaN instead of 0?

Returns
---------
interpolated : DataArray
    Data array with interpolated values. Coordinates and
    dimensions are copied from points.

You don't need to necessarily handle all the edge cases of possible input to destination, but it would be nice to handle destinations with labeled dimensions (e.g., in the form of a dictionary or xray.Dataset), and return labeled data in the form of an xray DataArray, with the same coordinate values (if they were supplied).

The DataArray constructor isn't in the currently released version of xray but it will be a major feature of the next release (hopefully to be released in a week or two) and it might be worth using here. It should be clearly documented in the docstring of DataArray.__init__ but let me know if you have any questions.

@shoyer
Copy link
Member

shoyer commented Jul 29, 2014

Actually, I'm thinking now that we could probably (theoretically) fit all interpolation strategies into a single interpolate method, where the specific type of interpolation (e.g., grid -> grid, grid -> unstructured, unstructured -> grid) is selected based on the shape of source and destination.

That said... perhaps its better to stick with a simpler function for now and we can figure out the bigger picture later. Looking at all the scipy functions, there are a lot of hypothetical options and I'm not sure which we'd want to keep in the final method. Also, kitchen sink type functions (e.g., pandas.DataFrame.plot) are not necessarily well-advised.

@nfaggian
Copy link

For what its worth, I wrote this today. Its a long way from being useful but I find it's working well enough to fill gaps in data after a reindex()

from scipy import interpolate, ndimage

def linterp(data, index, interp_index, order=1):
    """
    Parameters
    ----------
        data: nd-array (cube).
        index: index (floats) associated with the cube.
        interp_index: float interpolation poing. 
    Returns
    -------
        interpolated: nd-array
            An interpolated field.
    """

    # Form a cube of the values, which we will imagine is our function f()
    cube = np.array(data, dtype=np.float)

    # Form a relationship, and this can be non-linear, between slice indexes and
    # the "index".
    m = interpolate.interp1d(index, range(len(index)))

    # Form a set of coordinates to sample over - x
    y, x = np.mgrid[0:data[0].shape[0], 0:data[0].shape[1]]
    z = np.ones_like(y) * m(interp_index)

    # Perform the sampling f(x), map coordinates is performing a linear
    # interpolation of the coordinates in the cube.
    return ndimage.map_coordinates(
        cube,
        [z, y, x],
        order=order,
        cval=np.nan)

@den-run-ai
Copy link

+1

@saulomeirelles
Copy link

Hi All,

This is indeed an excellent project with great potential!

I am wondering if there is any progress on the interpolation issue. I am working with an irregular time series which I would pretty much like to upsample using xray.

Thanks for all the effort!

Saulo

@shoyer
Copy link
Member

shoyer commented Oct 24, 2015

@saulomeirelles I don't have any progress to share on this issue. If this issue is important to you, contributions would be gratefully accepted!

@sjpfenninger
Copy link
Contributor

I've written a wrapper around scipy's map_coordinates. I cleaned this up a bit and pushed it here: sjpfenninger@0e0d88d

It's not exactly fully-featured but scratches the itch I had, which is to pass DataArrays through map_coordinates with support for xarray's coordinates.

@shoyer If this seems of use, I could add some tests and perhaps an example, then submit a pull request?

@shoyer
Copy link
Member

shoyer commented Apr 1, 2016

Yes, this like looks useful to me!

On Fri, Apr 1, 2016 at 7:32 AM, Stefan Pfenninger [email protected]
wrote:

I've written a wrapper around scipy's map_coordinates. I cleaned this up
a bit and pushed it here: sjpfenninger/xarray@0e0d88d
sjpfenninger@0e0d88d

It's not exactly fully-featured but scratches the itch I had, which is to
pass DataArrays through map_coordinates with support for xarray's
coordinates.

@shoyer https://github.com/shoyer If this seems of use, I could add
some tests and perhaps an example, then submit a pull request?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#191 (comment)

@sjpfenninger
Copy link
Contributor

Ok, I'll extend this into a pull request one of these days

@jgerardsimcock
Copy link

I have a dataset that looks like the following:

<xarray.Dataset>
Dimensions:  (lat: 720, lon: 1440, time: 365)
Coordinates:
  * time     (time) datetime64[ns] 2006-01-01T12:00:00 2006-01-02T12:00:00 ...
  * lat      (lat) float32 -89.875 -89.625 -89.375 -89.125 -88.875 -88.625 ...
  * lon      (lon) float32 0.125 0.375 0.625 0.875 1.125 1.375 1.625 1.875 ...
Data variables:
    tasmax   (time, lat, lon) float64 272.6 272.6 272.6 272.6 272.6 272.6 ...
Attributes:
    parent_experiment:              historical
    parent_experiment_id:           historical
    parent_experiment_rip:          r1i1p1
    Conventions:                    CF-1.4
    institution:                    NASA Earth Exchange, NASA Ames Research C...
    institute_id:                   NASA-Ames
    realm:                          atmos
    modeling_realm:                 atmos
    version:                        1.0
    downscalingModel:               BCSD
    experiment_id:                  rcp85
    frequency:                      day
    realization:                    1
    initialization_method:          1
    physics_version:                1
    tracking_id:                    fd966a07-baec-44e4-8f69-e3cfb2d70dfa
    driving_data_tracking_ids:      N/A
    driving_model_ensemble_member:  r1i1p1
    driving_experiment_name:        historical
    driving_experiment:             historical
    model_id:                       BCSD
    references:                     BCSD method: Thrasher et al., 2012, Hydro...
    DOI:                            http://dx.doi.org/10.7292/W0MW2F2G
    experiment:                     RCP8.5
    title:                          ACCESS1-0 global downscaled NEX CMIP5 Cli...
    contact:                        Dr. Rama Nemani: [email protected], Dr...
    disclaimer:                     This data is considered provisional and s...
    resolution_id:                  0.25 degree
    project_id:                     NEXGDDP
    table_id:                       Table day (12 November 2010)
    source:                         BCSD 2014
    creation_date:                  2015-01-07T20:16:06Z
    forcing:                        N/A
    product:                        output

I am trying to do a linear interpolation for each day where the temp is nan. Is there a straightforward way to do this in Xarray?

@max-sixty
Copy link
Collaborator

@fujiisoup could we close this given your recent work

@fujiisoup
Copy link
Member

closed via #2104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants