Skip to content

Added convenience method for saving DataArray to netCDF file #990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 6, 2016

Conversation

robintw
Copy link
Contributor

@robintw robintw commented Aug 28, 2016

Added a simple function to DataArray that creates a dataset with one variable
called 'data' and then saves it to a netCDF file. All parameters are passed through
to to_netcdf().

Added an equivalent function called open_dataarray to be used to load from these files.

Fixes #915.

@shoyer
Copy link
Member

shoyer commented Aug 28, 2016

This is great, but it needs tests and documentation :).

@shoyer
Copy link
Member

shoyer commented Aug 28, 2016

I'm also a little concerned about the design here, because it always saves/restores the DataArray with the name 'data', which could override the original name. Ideally this method should save/restore a DataArray as faithfully as possible.

Instead, maybe we should only override the original name if it's None, or the same name as one of it's coordinates (which would otherwise result in an invalid NetCDF file).

Then we could restore by looking for a Dataset with single data variable regardless of the name, e.g.,

if len(dataset.data_vars) != 1:
    raise ValueError
data_array, = dataset.data_vars.values()

@robintw
Copy link
Contributor Author

robintw commented Aug 28, 2016

Thanks @shoyer - I wasn't actually aware that DataArrays can have a name before they're put into a Dataset (that's one of the things I love about contributing to OSS: you always learn things about the software that you never knew before!). I'll deal with the design issues ASAP.

How would you suggest testing this? I wasn't sure where to put the tests, and what exactly to test? Presumably something about the equivalence between this and saving after manually converting to a dataset, but I wasn't really sure how best to go about it.

@shoyer
Copy link
Member

shoyer commented Aug 29, 2016

I would add these tests to test_backends, probably a new test class. The
best test is probably a round-trip test where we save and load a DataArray
and verify that the result is identical to what we started with.
On Sun, Aug 28, 2016 at 2:20 PM Robin Wilson [email protected]
wrote:

Thanks @shoyer https://github.com/shoyer - I wasn't actually aware that
DataArrays can have a name before they're put into a Dataset (that's
one of the things I love about contributing to OSS: you always learn things
about the software that you never knew before!). I'll deal with the design
issues ASAP.

How would you suggest testing this? I wasn't sure where to put the tests,
and what exactly to test? Presumably something about the equivalence
between this and saving after manually converting to a dataset, but I
wasn't really sure how best to go about it.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#990 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABKS1qbgrUqpkuRNqP4BjhclA25MrRN-ks5qkfusgaJpZM4Ju3Aq
.

@robintw
Copy link
Contributor Author

robintw commented Aug 31, 2016

I think I've dealt with the comments/issues now, but let me know if more needs doing.


Parameters
----------
All parameters are passed directly to `xarray.open_dataset`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention the parameters again (at least path). The fact that this uses open_dataset internally is an implementation detail.

@robintw
Copy link
Contributor Author

robintw commented Aug 31, 2016

Thanks for the useful comments.

I've added some bits to the Sphinx docs in the places that I think they should go - but let me know if I need to add any more.

@robintw
Copy link
Contributor Author

robintw commented Sep 1, 2016

Also, it seems like there might be an issue with not closing the Dataset - but only on Windows (see the AppVeyor build failure at https://ci.appveyor.com/project/jhamman/xarray-injyf/build/1.0.268/job/kr8ss684ijxg55be).

Any ideas how this could be sorted without having to implement .close() for DataArray (or, alternatively, how to go about implementing that, given that I don't think a DataArray contains a reference to the Dataset that it may have come from)

@shoyer
Copy link
Member

shoyer commented Sep 1, 2016

The easiest fix would be to add a decorator to skip tests on Windows, like the existing decorators that we use:
https://github.com/pydata/xarray/blob/master/xarray/test/__init__.py#L73

I'm not sure off hand how to detect if we're running on Windows but it can't be hard.

To add a close method, I would copy these lines from dataset.py (or better use, move them to the common base class BaseDataObject in common.py):
https://github.com/pydata/xarray/blob/master/xarray/core/dataset.py#L251-L262

Then, you just need to define _file_obj = None in DataArray.__init__ and set it when appropriate.

@robintw
Copy link
Contributor Author

robintw commented Sep 1, 2016

Thanks - I've got that working with the close() method, and I did it 'properly' using the BaseDataObject class.

The only failure on here now is the same as #991 - a conda error creating one of the environments.

@@ -578,6 +578,19 @@ def where(self, cond, other=None, drop=False):

return outobj._where(outcond)

def close(self):
"""Close any files linked to this dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataset -> object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

@shoyer
Copy link
Member

shoyer commented Sep 1, 2016

I have a few final nits, but looking very nice!

@robintw
Copy link
Contributor Author

robintw commented Sep 2, 2016

Great, thanks. I think I've done these changes (frustratingly I can't see a way to use *args/*kwargs after using the full argument list in the function definition, so I've had to write it out. Let me know if there is a better way, but I suspect there isn't).

chunks, lock, drop_variables)

if len(dataset.data_vars) != 1:
raise ValueError('Given file dataset contains more than one variable. '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable -> data variable

@shoyer
Copy link
Member

shoyer commented Sep 2, 2016

I have a couple inline comments that I think you missed?

@robintw
Copy link
Contributor Author

robintw commented Sep 2, 2016

Oops yes, sorry!

I'm having a bit of an issue with changing the strings to constants (which, by the way, I completely agree is how it should be done). I've defined the constants in backends/api.py and am then trying to import them in core/dataarray.py.

I am doing the import as:

from ..backends import DATAARRAY_NAME, DATAARRAY_VARIABLE

But this gives me the following error:

../../anaconda3/lib/python3.5/site-packages/py/_path/local.py:650: in pyimport
    __import__(modname)
xarray/__init__.py:3: in <module>
    from .core.extensions import (register_dataarray_accessor,
xarray/core/extensions.py:3: in <module>
    from .dataarray import DataArray
xarray/core/dataarray.py:9: in <module>
    from ..backends import DATAARRAY_NAME, DATAARRAY_VARIABLE
xarray/backends/__init__.py:8: in <module>
    from .netCDF4_ import NetCDF4DataStore
xarray/backends/netCDF4_.py:6: in <module>
    from .. import Variable
E   ImportError: cannot import name 'Variable'

It seems like my relative import is somehow mucking up another relative import inside the netCDF4 stuff. Do you have any ideas how I can solve this? Am I just doing my imports wrong?

@shoyer
Copy link
Member

shoyer commented Sep 3, 2016

We have a bit of a circular imports mess :). For now, just import them at the top of the to_netcdf method.

@robintw
Copy link
Contributor Author

robintw commented Sep 3, 2016

Ah ok - that works (not entirely sure why, but it works!)

Added a simple function to DataArray that creates a dataset with one variable
called 'data' and then saves it to a netCDF file. All parameters are passed through
to to_netcdf().

Added an equivalent function called `open_dataarray` to be used to load from these files.
Changed name that is used if da has no name
Stored old name in attrs so it can be replaced
Updated tests
Updated docs
When loading a DataArray, the _file_obj attribute is set to the same object
as the source Dataset's _file_obj. This can then be used to close the file
from the DataArray.

Tests have been updated to use this
@robintw
Copy link
Contributor Author

robintw commented Sep 5, 2016

I realised that the reason that Travis and AppVeyor weren't running properly was because there were conflicts between this PR and master. I think I've managed to rebase and push properly - but let me know if I've done anything wrong.

@shoyer shoyer merged commit 69ac511 into pydata:master Sep 6, 2016
@shoyer
Copy link
Member

shoyer commented Sep 6, 2016

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants