Skip to content

xarray to and from iris #814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from
Closed

xarray to and from iris #814

wants to merge 16 commits into from

Conversation

nparley
Copy link

@nparley nparley commented Apr 1, 2016

Added a first go at a converter from xarray.DataArrays objects to Iris.cube.Cube objects. Uses the same template as the cdms2 conversion.

…s.cube.Cube objects. Uses the same template as the cdms2 conversion.
@nparley
Copy link
Author

nparley commented Apr 1, 2016

Reference #621 and #37. Uses template from #236.

@nparley
Copy link
Author

nparley commented Apr 1, 2016

docker pull nparley/xarray_test is a python 2.7 environment with Iris etc. which mirrors the travis CONDA_ENV=py27-cdat+iris+pynio.

It doesn't do everything, but it's a good first go. Here is an example:

>>> import xarray
>>> import iris
>>> nc_file = 'E1_north_america.nc'
>>> cube = iris.load_cube(nc_file)
>>> darray = xarray.open_dataset(nc_file)['air_temperature']
>>> print(darray)
<xarray.DataArray 'air_temperature' (time: 240, latitude: 37, longitude: 49)>
[435120 values with dtype=float32]
Coordinates:
  * time                     (time) datetime64[ns] 1860-06-01 1861-06-01 ...
  * latitude                 (latitude) float32 15.0 16.25 17.5 18.75 20.0 ...
  * longitude                (longitude) float32 225.0 226.875 228.75 ...
    forecast_period          (time) timedelta64[ns] 449 days 18:00:00 ...
    forecast_reference_time  datetime64[ns] 1859-09-01T06:00:00
    height                   float64 1.5
Attributes:
    standard_name: air_temperature
    units: K
    Model scenario: E1
    ukmo__um_stash_source: m01s03i236
    source: Data from Met Office Unified Model 6.05
    cell_methods: time: mean (interval: 6 hour)
>>> print(xarray.DataArray.from_iris(cube))
<xarray.DataArray u'air_temperature' (time: 240, latitude: 37, longitude: 49)>
[435120 values with dtype=float32]
Coordinates:
  * time                     (time) datetime64[ns] 1860-06-01 1861-06-01 ...
  * latitude                 (latitude) float32 15.0 16.25 17.5 18.75 20.0 ...
  * longitude                (longitude) float32 225.0 226.875 228.75 ...
    forecast_reference_time  datetime64[ns] 1859-09-01T06:00:00
    height                   float64 1.5
    forecast_period          (time) timedelta64[ns] 449 days 18:00:00 ...
Attributes:
    Model scenario: E1
    STASH: m01s03i236
    Conventions: CF-1.5
    source: Data from Met Office Unified Model 6.05
    standard_name: air_temperature
    units: K
>>> print(cube)
air_temperature / (K)               (time: 240; latitude: 37; longitude: 49)
     Dimension coordinates:
          time                           x              -              -
          latitude                       -              x              -
          longitude                      -              -              x
     Auxiliary coordinates:
          forecast_period                x              -              -
     Scalar coordinates:
          forecast_reference_time: 1859-09-01 06:00:00
          height: 1.5 m
     Attributes:
          Conventions: CF-1.5
          Model scenario: E1
          STASH: m01s03i236
          source: Data from Met Office Unified Model 6.05
     Cell methods:
          mean: time (6 hour
>>> print(darray.to_iris())
air_temperature / (K)               (time: 240; latitude: 37; longitude: 49)
     Dimension coordinates:
          time                           x              -              -
          latitude                       -              x              -
          longitude                      -              -              x
     Auxiliary coordinates:
          forecast_period                x              -              -
     Scalar coordinates:
          forecast_reference_time: 1859-09-01 06:00:00
          height: 1.5 m
     Attributes:
          Model scenario: E1
          source: Data from Met Office Unified Model 6.05
          ukmo__um_stash_source: m01s03i236

'Fire Temperature'},
dims=('distance', 'time'))

expected_coords = [Coordinate('distance', [-2, 2]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to just pull these out of original.coords instead?

coord = encode(dataarray.coords[coord_name])
coord_args = get_args(coord.attrs)
coord_args['var_name'] = coord_name
iris_coord = iris.coords.DimCoord(coord.values, **coord_args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are probably going to need to use AuxCoord for some coordinates -- maybe anything that doesn't match a dimension name? In principle, DimCoord instances must be 1d, numeric and strictly monotonic. It does seem like this works for your test examples, though, so I'm not sure what's going on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I had that thought too last night. I started off with the code only working for dimensional coordinates and then extended to auxiliary later. It's because add_aux_coord can accept DimCoord or AuxCoord and my test is only 1D. I think it'll probably fail if the coord is 2D with the way it's currently written. I will check.

@shoyer
Copy link
Member

shoyer commented Apr 2, 2016

This is awesome!

Let's try to add some documentation -- at least a mention in What's New.

CC @pelson

from .conventions import (
maybe_encode_timedelta, maybe_encode_datetime, decode_cf)

ignored_attrs = set(['name', 'tileIndex'])
cdms2_ignored_attrs = {'name', 'tileIndex'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we support Python 2.6, so set literals will cause test failures.

I think we can change this for the next major release of xarray though -- I just sent out a mailing list post about this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change these, I wasn't doing dictionary comprehension with the new syntax for this reason but forgot about sets. It would not surprise me if quite a few scientist were still using python 2.6. Why did this not get picked up in the travis py26 tests?. Actually I can answer my own question here, I guess it's because no tests hit xarray/convert.py if Iris or cdms are not installed.

@pelson
Copy link

pelson commented Apr 2, 2016

Thanks for pinging @shoyer. I'll also ping @rhattersley, and one of us will take a good look in the next few days. 👍

…rd and DimCoord correctly so 2d coords will work. Use dims variable to convert dimension numbers into names. Add 2d coord to test DataArray.
@nparley
Copy link
Author

nparley commented Apr 2, 2016

There is a problem when it comes to bounds. Xarray adds the bounds as another DataArray to the DataSet, i.e.:

Data variables:
    votemper      (dim0, dim1) float64 nan nan nan nan nan nan nan nan nan ...
    deptht_bnds   (bnds) float64 0.0 10.0

I can get the name of this variable from the Coord attribute in the DataArray for another variable, i.e. ds['votemper'].coords['deptht'].bounds but can't get the bound array from the DataArray? is that correct?

@shoyer
Copy link
Member

shoyer commented Apr 2, 2016

There's no good solution for representing bounds in xarray -- we can put these as coordinates on Dataset objects, but they aren't supported on DataArray objects. I had some plans to add an IntervalIndex type to pandas (pandas-dev/pandas#8707) which would let us support bounds, but honestly I'm not sure if/when I'll finish that.

@nparley
Copy link
Author

nparley commented Apr 3, 2016

Ok, bounds can just be something to add to the converter if / when it's supported.

@nparley
Copy link
Author

nparley commented Apr 4, 2016

Latest commit fixes bug if the Iris cube does not have any dimension coordinates. What do people think about keeping or dropping the automatic dimension coordinates that xarray creates? And if dropping should be done should all coordinates named dim{x} be dropped? Or are we happy with just leaving them there?

For example at the moment:

>>> print(cube)
sea_water_potential_temperature / (degC) (-- : 148; -- : 180)
     Auxiliary coordinates:
          latitude                           x         x
          longitude                          x         x
     Scalar coordinates:
          depth: 4.99994 m, bound=(0.0, 10.0) m
          time:    1-01-01 12:00:00
     Attributes:
          Conventions: CF-1.5
     Cell methods:
          mean: time_counter
>>> print(xarray.DataArray.from_iris(cube).to_iris())
sea_water_potential_temperature / (degC) (dim0: 148; dim1: 180)
     Dimension coordinates:
          dim0                                x          -
          dim1                                -          x
     Auxiliary coordinates:
          latitude                            x          x
          longitude                           x          x
     Scalar coordinates:
          depth: 4.99993801117 m
          time:    1-01-01 12:00:00
     Attributes:
          Conventions: CF-1.5

@rhattersley
Copy link

What do people think about keeping or dropping the automatic dimension coordinates that xarray creates?

Is there any way to know for sure whether a coordinate has been automatically created? Or is the name the only clue?

@rhattersley
Copy link

Uses the same template as the cdms2 conversion.

If this is the way to do things then does this comment need deleting?

@nparley
Copy link
Author

nparley commented Apr 4, 2016

The comment was just to mention that I had based my code on the code from cdms2 from #236 as suggested by @shoyer. I guess this comment would not be needed in the final merge.

@shoyer
Copy link
Member

shoyer commented Apr 4, 2016

Is there any way to know for sure whether a coordinate has been automatically created? Or is the name the only clue?

I'm afraid the name is the only clue. My inclination would be not to do anything special here to "un-name" default coordinates, assuming Iris has a straightforward way to get rid of them. Dimensions like dim0 come up quickly in demo code but rarely in practice -- it's mostly a signal to the user that the should pick a meaningful name.

If this is the way to do things then does this comment need deleting?

Yes, that comment is out of date. I'm currently refactoring that code though, so let me take care care of it.

There's a separate argument about whether or not xarray should automatically convert cubes when passed into the xarray.DataArray constructor. We do this for pandas objects, but not any other labelled array types, currently.

@shoyer
Copy link
Member

shoyer commented Apr 4, 2016

One other thing that I noticed from the latest example -- it would be nice to covert cell methods from Cubes into string attributes on DataArrays.

@nparley
Copy link
Author

nparley commented Apr 8, 2016

OK the latest commit tries to do the cell methods:

>>> print(cube)
sea_water_potential_temperature / (degC) (-- : 148; -- : 180)
     Auxiliary coordinates:
          latitude                           x         x
          longitude                          x         x
     Scalar coordinates:
          depth: 4.99994 m, bound=(0.0, 10.0) m
          time:    1-01-01 12:00:00
     Attributes:
          Conventions: CF-1.5
     Cell methods:
          mean: time_counter
>>> print(xarray.DataArray.from_iris(cube))
<xarray.DataArray u'votemper' (dim0: 148, dim1: 180)>
array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       ..., 
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan]], dtype=float32)
Coordinates:
    deptht        float64 5.0
    time_counter  object    1-01-01 12:00:00
    nav_lat       (dim0, dim1) float32 -78.1906 -78.1906 -78.1906 -78.1906 ...
    nav_lon       (dim0, dim1) float32 80.0 82.0 83.9999 85.9999 87.9999 ...
  * dim0          (dim0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
  * dim1          (dim1) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
Attributes:
    units: degC
    long_name: Temperature
    standard_name: sea_water_potential_temperature
    cell_methods: time_counter: mean
    Conventions: CF-1.5
>>> print(xarray.DataArray.from_iris(cube).to_iris())
sea_water_potential_temperature / (degC) (dim0: 148; dim1: 180)
     Dimension coordinates:
          dim0                                x          -
          dim1                                -          x
     Auxiliary coordinates:
          latitude                            x          x
          longitude                           x          x
     Scalar coordinates:
          depth: 4.99993801117 m
          time:    1-01-01 12:00:00
     Attributes:
          Conventions: CF-1.5
     Cell methods:
          mean: time_counter

skip = False
continue
# If this word is an axis
if word not in cell_methods_strings | set(['interval', 'comment']):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give an example of what you're trying to catch here? this looks very complicated to me -- maybe there is an more straightforward way to do this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there may, but it is actually a little complicated to parse these strings. The format is:

"coord1: [coord2:] method: [comment: a comment] [interval: a interval]"

I.e. here are some examples:

>>> get_cell_methods('dim1: dim2: mean: comment: A comment: interval: A interval')
[CellMethod(method='mean', coord_names=('dim1', 'dim2'), intervals=('A interval',), comments=('A comment',))]
>>> get_cell_methods('dim1: dim2: mean: dim2: sum:')
[CellMethod(method='mean', coord_names=('dim1', 'dim2'), intervals=(), comments=()), CellMethod(method='sum', coord_names=('dim2',), intervals=(), comments=())]
>>> get_cell_methods('dim1: mean: dim2: dim3: sum: comment: A comment')
[CellMethod(method='mean', coord_names=('dim1',),  intervals=(), comments=()), CellMethod(method='sum', coord_names=('dim2', 'dim3'), intervals=(), comments=('A comment',))]

So that line is looking to see if the word is a method or "comment" or "interval", if not it's a coord and so checks to see if it's a new cell method or another coord for the current one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something like iris.coords.CellMethod.from_string which handles this logic? It might be a happier path to expose such a method from Iris (which already is using this sort of logic internally) rather than incorporating it into xarray.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pelson that's going the other direction: iris coordinate -> string (which I think is easier to get right)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer I had a look before I wrote that code but could not see one. Is there one in the netcdf library?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the code is horrifically obfuscated in Pyke rules: https://github.com/SciTools/iris/blob/master/lib/iris/fileformats/_pyke_rules/fc_rules_cf.krb#L1556-L1619

I'd be completely happy if this were moved into a more useful place. I think at the moment, it is reachable through iris.fileformats._pyke_rules.compiled_krb.fc_rules_cf_fc, but I haven't tested it, and can't guarantee its stability there.

@nparley
Copy link
Author

nparley commented Apr 20, 2016

OK this update uses iris.fileformats._pyke_rules.compiled_krb.fc_rules_cf_fc as @pelson mentioned in the comment above. I also added a cell method to the test

"""
# Iris not a hard dependency
import iris
import iris.fileformats._pyke_rules.compiled_krb.fc_rules_cf_fc \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some sort of guard so this doesn't fail if/when Iris cleans up it's internals and moves this function? We've been bitten by using private APIs before.

Ideally, @pelson would add something exposing this to the public API for the next release, and we could use something like:

try:
    from iris.fileformats import parse_cell_methods
except ImportError:
    # prior to Iris 2.0 (or whatever)
    from iris.fileformats._pyke_rules.compile_krb.fc_rules_cf_fc \
        import _parse_cell_methods as parse_cell_methods

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can add that. @pelson how do you think this might be exposed by Iris?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put up a PR for iris.fileformats.netcdf.parse_cell_methods

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nparley
Copy link
Author

nparley commented Jun 26, 2016

@shoyer I have not forgotten about this PR, I just been working a lot on a pandas PR updating the ci and it's been taking up a lot of my spare time.

@shoyer
Copy link
Member

shoyer commented Jun 26, 2016

No worries, no rush here from my end :).
On Sun, Jun 26, 2016 at 1:35 PM Neil Parley [email protected]
wrote:

@shoyer https://github.com/shoyer I have not forgotten about this PR, I
just been working a lot on a pandas PR updating the ci and it's been taking
up a lot of my spare time.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#814 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABKS1tPi3K7IyPa2MwwQWDfnrwMemmetks5qPmQJgaJpZM4H9jGH
.

@pelson
Copy link

pelson commented Jun 27, 2016

@pelson @rhattersley any tips on handling the edge cases of cubes with some, but not all missing dim_coords?

I might consider something like (untested):

for dim in xrange(cube.ndim):
    try:
        dim_coord = cube.coord(dim_coords=True, dimensions=(dim,))
    except iris.exceptions.CoordinateNotFoundError:
        index_coord = range(cube.shape[dim])

@nparley
Copy link
Author

nparley commented Aug 9, 2016

This adds the extra code that @pelson suggested but it would be nice to create a little unit test with a mixed Dimension and Auxiliary coordinates. Is there an easy way to create a simple cube in Iris that could test this?

dims.append(dim_coord.var_name)
except iris.exceptions.CoordinateNotFoundError:
index_coord = range(cube.shape[dim])
dims.append("dim{}".format(index_coord))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use dim_{} to be consistency with our normal rule for default dimensions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually be something like dims.append('dim_{}'.format(dim)) -- index_coord is currently the default coordinate labels, not the dimension name.

coord_attrs = _iris_obj_to_attrs(coord)
coord_dims = [dims[i] for i in cube.coord_dims(coord)]
if coord_dims:
coords[coord.var_name] = (coord_dims, coord.points, coord_attrs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's also possible to coord.var_name be None. We should probably raise an error in that case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an xarray exception that might make sense for the error? Or just nameerror?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually just raise ValueError for cases like this where we cannot handle
the input.

On Wed, Aug 10, 2016 at 2:01 PM, Neil Parley [email protected]
wrote:

In xarray/convert.py
#814 (comment):

  •    try:
    
  •        dim_coord = cube.coord(dim_coords=True, dimensions=(i,))
    
  •        dims.append(dim_coord.var_name)
    
  •    except iris.exceptions.CoordinateNotFoundError:
    
  •        dims.append("dim_{}".format(i))
    
  • dims = [dim.var_name for dim in cube.dim_coords]

  • if not dims:

  • dims = ["dim{}".format(i) for i in range(cube.data.ndim)]

  • coords = OrderedDict()
  • for coord in cube.coords():
  •    coord_attrs = _iris_obj_to_attrs(coord)
    
  •    coord_dims = [dims[i] for i in cube.coord_dims(coord)]
    
  •    if coord_dims:
    
  •        coords[coord.var_name] = (coord_dims, coord.points, coord_attrs)
    

Is there an xarray exception that might make sense for the error? Or just
nameerror?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pydata/xarray/pull/814/files/338ef6b7242c06d5107d2224c1df4cdb2a4b951c#r74329887,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABKS1vRRJWIawQDcDJYZxXjQEcbQJIjkks5qejw5gaJpZM4H9jGH
.

@pelson
Copy link

pelson commented Nov 22, 2016

This is looking pretty comprehensive to me. Could maybe do with a couple of tests perhaps, but otherwise 👍 .

@shoyer
Copy link
Member

shoyer commented Nov 22, 2016

This is looking pretty comprehensive to me. Could maybe do with a couple of tests perhaps, but otherwise 👍 .

Indeed, merging this is really just blocked on tests right now. We need at least something so we can verify that this basically works.

@nparley
Copy link
Author

nparley commented Nov 22, 2016

This did used to have some tests, but they seem to have got lost somewhere along the way. I'll add the old tests back in, which would be a start.

@DWesl
Copy link
Contributor

DWesl commented Jun 19, 2017

If you're still looking for the old tests, it looks like they disappeared in the last merge commit, f48de5.

@shoyer
Copy link
Member

shoyer commented Jun 19, 2017 via email

@nparley
Copy link
Author

nparley commented Jun 20, 2017

Sorry guys I don't have time to do anything on this (or any open source stuff) at the moment.

@shoyer
Copy link
Member

shoyer commented Jun 20, 2017 via email

@duncanwp
Copy link
Contributor

I'm happy to pick this up - I'm not sure how best to pick up someone else's pull request though. Am I best off forking nparley:master then resubmitting, or creating a new one and merging in the changes from nparley:master?

@nparley
Copy link
Author

nparley commented Nov 30, 2017

I think also I could make you a contributor or transfer the branch to someone. So that might be an option too.

@duncanwp duncanwp mentioned this pull request Nov 30, 2017
4 tasks
@shoyer
Copy link
Member

shoyer commented Nov 30, 2017 via email

@pelson
Copy link

pelson commented Jan 3, 2018

Closed by #1750?

@fmaussion
Copy link
Member

Indeed. Thanks to all people involved!

@fmaussion fmaussion closed this Jan 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants