-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
xarray to and from iris #814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s.cube.Cube objects. Uses the same template as the cdms2 conversion.
It doesn't do everything, but it's a good first go. Here is an example: >>> import xarray
>>> import iris
>>> nc_file = 'E1_north_america.nc'
>>> cube = iris.load_cube(nc_file)
>>> darray = xarray.open_dataset(nc_file)['air_temperature']
>>> print(darray)
<xarray.DataArray 'air_temperature' (time: 240, latitude: 37, longitude: 49)>
[435120 values with dtype=float32]
Coordinates:
* time (time) datetime64[ns] 1860-06-01 1861-06-01 ...
* latitude (latitude) float32 15.0 16.25 17.5 18.75 20.0 ...
* longitude (longitude) float32 225.0 226.875 228.75 ...
forecast_period (time) timedelta64[ns] 449 days 18:00:00 ...
forecast_reference_time datetime64[ns] 1859-09-01T06:00:00
height float64 1.5
Attributes:
standard_name: air_temperature
units: K
Model scenario: E1
ukmo__um_stash_source: m01s03i236
source: Data from Met Office Unified Model 6.05
cell_methods: time: mean (interval: 6 hour)
>>> print(xarray.DataArray.from_iris(cube))
<xarray.DataArray u'air_temperature' (time: 240, latitude: 37, longitude: 49)>
[435120 values with dtype=float32]
Coordinates:
* time (time) datetime64[ns] 1860-06-01 1861-06-01 ...
* latitude (latitude) float32 15.0 16.25 17.5 18.75 20.0 ...
* longitude (longitude) float32 225.0 226.875 228.75 ...
forecast_reference_time datetime64[ns] 1859-09-01T06:00:00
height float64 1.5
forecast_period (time) timedelta64[ns] 449 days 18:00:00 ...
Attributes:
Model scenario: E1
STASH: m01s03i236
Conventions: CF-1.5
source: Data from Met Office Unified Model 6.05
standard_name: air_temperature
units: K
>>> print(cube)
air_temperature / (K) (time: 240; latitude: 37; longitude: 49)
Dimension coordinates:
time x - -
latitude - x -
longitude - - x
Auxiliary coordinates:
forecast_period x - -
Scalar coordinates:
forecast_reference_time: 1859-09-01 06:00:00
height: 1.5 m
Attributes:
Conventions: CF-1.5
Model scenario: E1
STASH: m01s03i236
source: Data from Met Office Unified Model 6.05
Cell methods:
mean: time (6 hour
>>> print(darray.to_iris())
air_temperature / (K) (time: 240; latitude: 37; longitude: 49)
Dimension coordinates:
time x - -
latitude - x -
longitude - - x
Auxiliary coordinates:
forecast_period x - -
Scalar coordinates:
forecast_reference_time: 1859-09-01 06:00:00
height: 1.5 m
Attributes:
Model scenario: E1
source: Data from Met Office Unified Model 6.05
ukmo__um_stash_source: m01s03i236 |
'Fire Temperature'}, | ||
dims=('distance', 'time')) | ||
|
||
expected_coords = [Coordinate('distance', [-2, 2]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to just pull these out of original.coords
instead?
coord = encode(dataarray.coords[coord_name]) | ||
coord_args = get_args(coord.attrs) | ||
coord_args['var_name'] = coord_name | ||
iris_coord = iris.coords.DimCoord(coord.values, **coord_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are probably going to need to use AuxCoord
for some coordinates -- maybe anything that doesn't match a dimension name? In principle, DimCoord
instances must be 1d, numeric and strictly monotonic. It does seem like this works for your test examples, though, so I'm not sure what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I had that thought too last night. I started off with the code only working for dimensional coordinates and then extended to auxiliary later. It's because add_aux_coord
can accept DimCoord
or AuxCoord
and my test is only 1D. I think it'll probably fail if the coord is 2D with the way it's currently written. I will check.
This is awesome! Let's try to add some documentation -- at least a mention in What's New. CC @pelson |
from .conventions import ( | ||
maybe_encode_timedelta, maybe_encode_datetime, decode_cf) | ||
|
||
ignored_attrs = set(['name', 'tileIndex']) | ||
cdms2_ignored_attrs = {'name', 'tileIndex'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we support Python 2.6, so set literals will cause test failures.
I think we can change this for the next major release of xarray though -- I just sent out a mailing list post about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change these, I wasn't doing dictionary comprehension with the new syntax for this reason but forgot about sets. It would not surprise me if quite a few scientist were still using python 2.6. Why did this not get picked up in the travis py26 tests?. Actually I can answer my own question here, I guess it's because no tests hit xarray/convert.py
if Iris or cdms are not installed.
Thanks for pinging @shoyer. I'll also ping @rhattersley, and one of us will take a good look in the next few days. 👍 |
…rd and DimCoord correctly so 2d coords will work. Use dims variable to convert dimension numbers into names. Add 2d coord to test DataArray.
There is a problem when it comes to bounds. Xarray adds the bounds as another DataArray to the DataSet, i.e.:
I can get the name of this variable from the Coord attribute in the DataArray for another variable, i.e. |
There's no good solution for representing bounds in xarray -- we can put these as coordinates on Dataset objects, but they aren't supported on DataArray objects. I had some plans to add an |
Ok, bounds can just be something to add to the converter if / when it's supported. |
Latest commit fixes bug if the Iris cube does not have any dimension coordinates. What do people think about keeping or dropping the automatic dimension coordinates that xarray creates? And if dropping should be done should all coordinates named For example at the moment: >>> print(cube)
sea_water_potential_temperature / (degC) (-- : 148; -- : 180)
Auxiliary coordinates:
latitude x x
longitude x x
Scalar coordinates:
depth: 4.99994 m, bound=(0.0, 10.0) m
time: 1-01-01 12:00:00
Attributes:
Conventions: CF-1.5
Cell methods:
mean: time_counter
>>> print(xarray.DataArray.from_iris(cube).to_iris())
sea_water_potential_temperature / (degC) (dim0: 148; dim1: 180)
Dimension coordinates:
dim0 x -
dim1 - x
Auxiliary coordinates:
latitude x x
longitude x x
Scalar coordinates:
depth: 4.99993801117 m
time: 1-01-01 12:00:00
Attributes:
Conventions: CF-1.5 |
Is there any way to know for sure whether a coordinate has been automatically created? Or is the name the only clue? |
If this is the way to do things then does this comment need deleting? |
I'm afraid the name is the only clue. My inclination would be not to do anything special here to "un-name" default coordinates, assuming Iris has a straightforward way to get rid of them. Dimensions like
Yes, that comment is out of date. I'm currently refactoring that code though, so let me take care care of it. There's a separate argument about whether or not xarray should automatically convert cubes when passed into the |
One other thing that I noticed from the latest example -- it would be nice to covert cell methods from Cubes into string attributes on DataArrays. |
OK the latest commit tries to do the cell methods: >>> print(cube)
sea_water_potential_temperature / (degC) (-- : 148; -- : 180)
Auxiliary coordinates:
latitude x x
longitude x x
Scalar coordinates:
depth: 4.99994 m, bound=(0.0, 10.0) m
time: 1-01-01 12:00:00
Attributes:
Conventions: CF-1.5
Cell methods:
mean: time_counter
>>> print(xarray.DataArray.from_iris(cube))
<xarray.DataArray u'votemper' (dim0: 148, dim1: 180)>
array([[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
...,
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
deptht float64 5.0
time_counter object 1-01-01 12:00:00
nav_lat (dim0, dim1) float32 -78.1906 -78.1906 -78.1906 -78.1906 ...
nav_lon (dim0, dim1) float32 80.0 82.0 83.9999 85.9999 87.9999 ...
* dim0 (dim0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
* dim1 (dim1) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
Attributes:
units: degC
long_name: Temperature
standard_name: sea_water_potential_temperature
cell_methods: time_counter: mean
Conventions: CF-1.5
>>> print(xarray.DataArray.from_iris(cube).to_iris())
sea_water_potential_temperature / (degC) (dim0: 148; dim1: 180)
Dimension coordinates:
dim0 x -
dim1 - x
Auxiliary coordinates:
latitude x x
longitude x x
Scalar coordinates:
depth: 4.99993801117 m
time: 1-01-01 12:00:00
Attributes:
Conventions: CF-1.5
Cell methods:
mean: time_counter |
skip = False | ||
continue | ||
# If this word is an axis | ||
if word not in cell_methods_strings | set(['interval', 'comment']): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give an example of what you're trying to catch here? this looks very complicated to me -- maybe there is an more straightforward way to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there may, but it is actually a little complicated to parse these strings. The format is:
"coord1: [coord2:] method: [comment: a comment] [interval: a interval]"
I.e. here are some examples:
>>> get_cell_methods('dim1: dim2: mean: comment: A comment: interval: A interval')
[CellMethod(method='mean', coord_names=('dim1', 'dim2'), intervals=('A interval',), comments=('A comment',))]
>>> get_cell_methods('dim1: dim2: mean: dim2: sum:')
[CellMethod(method='mean', coord_names=('dim1', 'dim2'), intervals=(), comments=()), CellMethod(method='sum', coord_names=('dim2',), intervals=(), comments=())]
>>> get_cell_methods('dim1: mean: dim2: dim3: sum: comment: A comment')
[CellMethod(method='mean', coord_names=('dim1',), intervals=(), comments=()), CellMethod(method='sum', coord_names=('dim2', 'dim3'), intervals=(), comments=('A comment',))]
So that line is looking to see if the word is a method or "comment" or "interval", if not it's a coord and so checks to see if it's a new cell method or another coord for the current one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there something like iris.coords.CellMethod.from_string
which handles this logic? It might be a happier path to expose such a method from Iris (which already is using this sort of logic internally) rather than incorporating it into xarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked closely enough, but maybe https://github.com/SciTools/iris/blob/master/lib/iris/fileformats/netcdf.py#L1422
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pelson that's going the other direction: iris coordinate -> string (which I think is easier to get right)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer I had a look before I wrote that code but could not see one. Is there one in the netcdf library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the code is horrifically obfuscated in Pyke rules: https://github.com/SciTools/iris/blob/master/lib/iris/fileformats/_pyke_rules/fc_rules_cf.krb#L1556-L1619
I'd be completely happy if this were moved into a more useful place. I think at the moment, it is reachable through iris.fileformats._pyke_rules.compiled_krb.fc_rules_cf_fc
, but I haven't tested it, and can't guarantee its stability there.
OK this update uses |
""" | ||
# Iris not a hard dependency | ||
import iris | ||
import iris.fileformats._pyke_rules.compiled_krb.fc_rules_cf_fc \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some sort of guard so this doesn't fail if/when Iris cleans up it's internals and moves this function? We've been bitten by using private APIs before.
Ideally, @pelson would add something exposing this to the public API for the next release, and we could use something like:
try:
from iris.fileformats import parse_cell_methods
except ImportError:
# prior to Iris 2.0 (or whatever)
from iris.fileformats._pyke_rules.compile_krb.fc_rules_cf_fc \
import _parse_cell_methods as parse_cell_methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I can add that. @pelson how do you think this might be exposed by Iris?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put up a PR for iris.fileformats.netcdf.parse_cell_methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer I have not forgotten about this PR, I just been working a lot on a pandas PR updating the ci and it's been taking up a lot of my spare time. |
No worries, no rush here from my end :).
|
I might consider something like (untested):
|
# Conflicts: # xarray/test/test_dataarray.py
This adds the extra code that @pelson suggested but it would be nice to create a little unit test with a mixed Dimension and Auxiliary coordinates. Is there an easy way to create a simple cube in Iris that could test this? |
dims.append(dim_coord.var_name) | ||
except iris.exceptions.CoordinateNotFoundError: | ||
index_coord = range(cube.shape[dim]) | ||
dims.append("dim{}".format(index_coord)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use dim_{}
to be consistency with our normal rule for default dimensions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should actually be something like dims.append('dim_{}'.format(dim))
-- index_coord
is currently the default coordinate labels, not the dimension name.
coord_attrs = _iris_obj_to_attrs(coord) | ||
coord_dims = [dims[i] for i in cube.coord_dims(coord)] | ||
if coord_dims: | ||
coords[coord.var_name] = (coord_dims, coord.points, coord_attrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's also possible to coord.var_name
be None. We should probably raise an error in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an xarray exception that might make sense for the error? Or just nameerror
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually just raise ValueError for cases like this where we cannot handle
the input.
On Wed, Aug 10, 2016 at 2:01 PM, Neil Parley [email protected]
wrote:
In xarray/convert.py
#814 (comment):
try:
dim_coord = cube.coord(dim_coords=True, dimensions=(i,))
dims.append(dim_coord.var_name)
except iris.exceptions.CoordinateNotFoundError:
dims.append("dim_{}".format(i))
dims = [dim.var_name for dim in cube.dim_coords]
if not dims:
dims = ["dim{}".format(i) for i in range(cube.data.ndim)]
- coords = OrderedDict()
- for coord in cube.coords():
coord_attrs = _iris_obj_to_attrs(coord)
coord_dims = [dims[i] for i in cube.coord_dims(coord)]
if coord_dims:
coords[coord.var_name] = (coord_dims, coord.points, coord_attrs)
Is there an xarray exception that might make sense for the error? Or just
nameerror?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pydata/xarray/pull/814/files/338ef6b7242c06d5107d2224c1df4cdb2a4b951c#r74329887,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABKS1vRRJWIawQDcDJYZxXjQEcbQJIjkks5qejw5gaJpZM4H9jGH
.
This is looking pretty comprehensive to me. Could maybe do with a couple of tests perhaps, but otherwise 👍 . |
Indeed, merging this is really just blocked on tests right now. We need at least something so we can verify that this basically works. |
This did used to have some tests, but they seem to have got lost somewhere along the way. I'll add the old tests back in, which would be a start. |
If you're still looking for the old tests, it looks like they disappeared in the last merge commit, f48de5. |
Yes, we would still love to get this in, though it might need a few other
changes to work now. Could you open a new PR with the tests added back in?
(Assuming Neil is not interested in picking this back up)
…On Mon, Jun 19, 2017 at 8:58 AM, DWesl ***@***.***> wrote:
If you're still looking for the old tests, it looks like they disappeared
in the last merge commit, f48de5
<f48de5a#diff-48ab4ba033ad06981f566a2e2f561f5aL1819>
.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#814 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1gi9-YLVTLayYcMiFTqVSUy1v729ks5sFpqkgaJpZM4H9jGH>
.
|
Sorry guys I don't have time to do anything on this (or any open source stuff) at the moment. |
No worries! The great thing about GitHub is that it's pretty easy for
someone else to pick up your work :)
…On Tue, Jun 20, 2017 at 12:25 AM, Neil Parley ***@***.***> wrote:
Sorry guys I don't have time to do anything on this (or any open source
stuff) at the moment.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#814 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1jC8Vrt_NKYbc4Z3-sgDpRAfaDqsks5sF3PXgaJpZM4H9jGH>
.
|
I'm happy to pick this up - I'm not sure how best to pick up someone else's pull request though. Am I best off forking nparley:master then resubmitting, or creating a new one and merging in the changes from nparley:master? |
I think also I could make you a contributor or transfer the branch to someone. So that might be an option too. |
I think either way works. Probably easiest to fork this branch and then
merge in the latest from master.
…On Thu, Nov 30, 2017 at 3:38 PM Neil Parley ***@***.***> wrote:
I think also I could make you a contributor or transfer the branch to
someone. So that might be an option too.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#814 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1oT1qWWy-o5CJe6Y45Y-DsGhjTucks5s7swSgaJpZM4H9jGH>
.
|
Closed by #1750? |
Indeed. Thanks to all people involved! |
Added a first go at a converter from xarray.DataArrays objects to Iris.cube.Cube objects. Uses the same template as the cdms2 conversion.