Skip to content

New feature: interp1d #2079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
crusaderky opened this issue Apr 24, 2018 · 8 comments · Fixed by #2104
Closed

New feature: interp1d #2079

crusaderky opened this issue Apr 24, 2018 · 8 comments · Fixed by #2104

Comments

@crusaderky
Copy link
Contributor

crusaderky commented Apr 24, 2018

I've written a series of wrappers for the 1-dimensional scipy interpolators.

Prototype code and colourful demo plots:
https://gist.github.com/crusaderky/b0aa6b8fdf6e036cb364f6f40476cc67

Features

  • Interpolate a ND array on any arbitrary dimension
  • Nearest-neighbour, linear, quadratic, cubic, Akima, PCHIP, and custom interpolators are supported
  • dask supported on both on the interpolated array and x_new
  • Supports ND x_new arrays
  • The CPU-heavy interpolator generation (splrep) is executed only once and then can be applied to multiple x_new (splev)
  • Pickleable and distributed-friendly

Design hacks

  • Depends on dask module, even when all inputs are based on plain numpy.
  • Abuses attrs and the ability to invoke a.attrname to get the user experience of a new DataArray method.
  • Abuses the fact that the chunks of a dask.array.Array can contain anything and you won't notice until you compute them.

Limitations

  • Can't dump to netcdf. Not solvable without hacking into the implementation details of scipy.
  • Datasets are not supported. Trivial to fix after solving apply_ufunc(dask='parallelized') output_dtypes for datasets #1699.
  • Chunks are not supported on x_new. Trivial to fix after solving apply_ufunc support for chunks on input_core_dims #1995.
  • Chunks are not supported along the interpolated dimension. This is very complicated to solve. If x and x_new were always monotonic ascending,it would be (not trivially) solvable with dask.array.ghost.ghost. If you make no assumptions about monotonicity, things become way more complicated. A solution would need to go in the dask module, and then be invoked trivially from here with dask='allowed'.
@fujiisoup
Copy link
Member

I think Interpolation (and scipy integration) is one of a good extensions of xarray ecosystem.

We discussed previously about contribute package/module in #1850 #1288 .
It sounds a good idea to me to start from contrib.scipy module.

I notice that different people have developed almost the same package in parallel,
https://github.com/lamorton/SciPyXarray
https://github.com/fujiisoup/xr-scipy
and maybe more.

@shoyer
Copy link
Member

shoyer commented Apr 25, 2018

If it were not for caching the grid setup, then this would make sense as a built-in xarray method interpolate_at (to complement interpolate_na).

With caching, this feels a little bit beyond the standard xarray data model, but well suited for a companion package. It would be great to start listing these on a doc page, so users can easily find them!

@fujiisoup also note @rabernat's xrft package, which has some overlap with your xr-scipy package.

@fujiisoup
Copy link
Member

If it were not for caching the grid setup, then this would make sense as a built-in xarray method interpolate_at (to complement interpolate_na).

This functionality sounds similar to reindex. Is it confusing to add this to reindex with an additional method such as method='linear'?

@fujiisoup also note @rabernat's xrft package, which has some overlap with your xr-scipy package.

This looks nice.
I personally think that it would be nicer if we merge these similar packages and make them more complete.
But yes, we can start listing these packages on a doc page first.

@rabernat
Copy link
Contributor

Regarding interpolation, this is one of the most common needs of users. I would support adding some sort of basic interpolation directly within xarray. I would eventually like to see multilinear n-dimensional interpolation supported.

I personally think that it would be nicer if we merge these similar packages and make them more complete.

xrft is focused quite narrowly on spectral analysis. It is very useful for our research group. But we haven't put much effort into making it widely accessible or comprehensive.

I agree that in principle it would be good to merge efforts. However, we say this often about this type of project, but in practice I'm not sure we have the time to really accomplish it. It would take significant developer effort. xr-scipy seems the most mature and well designed of the three packages, so that should probably be the one we focus efforts on if we go that route.

@crusaderky
Copy link
Contributor Author

For my use case slrep caching is critical, as I need to interpolate 20-something curves roughly 4000 times on different points. Changing the application to gather all points from downstream and do one big interpolation would not be feasible as it would kill off my RAM and be very hostile to a distributed environment.

@fmaussion
Copy link
Member

I am very much in favor of an xarray built-in interpolate_at method. Interpolating/regridding is domain agnostic and would be extremely useful to many downstream packages (salem does some regridding but it all happens on memory and uses scipy's RegularGridInterpolator under the hood). I would love to have an xarray func for this...

@shoyer
Copy link
Member

shoyer commented Apr 26, 2018

This functionality sounds similar to reindex. Is it confusing to add this to reindex with an additional method such as method='linear'?

I think this would be a little confusing because then reindex in xarray would work differently from reindex in pandas.

Also, interpolate is a more descriptive name than reindex, at least for scientists. I would rather read array.interpolate_at(x=points) than array.reindex_like(x=points, method='linear').

@fujiisoup fujiisoup mentioned this issue May 4, 2018
4 tasks
@crusaderky
Copy link
Contributor Author

As I was dissatisfied with the prototype, I scrapped it and rewrote it mocking the splrep/splev API. However my functions don't wrap around scipy.interpolate.splrep/splev, as those don't accept an n-dimensional y, but instead they wrap around scipy.interpolate.make_interp_spline and scipy.interpolate.BSpline (which is what scipy.interpolate.interp1d does too). Compared to the prototype above:

  • lost support for Akima, PCHIP, and the non-spline options of interp1d
  • MUCH more memory-efficient than before, particularly on distributed
  • no more hacks - splrep produces a plain Dataset, which can be stored on NetCDF, sliced, etc. etc.
  • gained ability to have chunks on x_new

I built a production-quality version (inclusive of documentation, unit tests, and all the trimmings) at https://github.com/crusaderky/xarray_extras. Happy to discuss moving it to somebody else's module.

You still can't have a chunked x. It is possible to implement it with dask.array.ghost.ghost, although it would be mutually exclusive with a chunked x_new - contributions are welcome.

Closing this ticket as I agree this is beyond the scope of the core xarray package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants