-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add a --drop-variables flag to xray.open_dataset to exclude certain variables #532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… variables from being decoded. It passes it to xray.decode_cf.
@@ -871,6 +874,9 @@ def decode_cf(obj, concat_characters=True, mask_and_scale=True, | |||
vars, attrs, coord_names = decode_cf_variables( | |||
vars, attrs, concat_characters, mask_and_scale, decode_times, | |||
decode_coords) | |||
if drop_variables is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you move this into decode_cf_variables
instead? probably could put this logic in the loop that constructs new_vars
Looks pretty good to me. However, this does need tests to verify that it works -- see here for some examples: https://github.com/xray/xray/blob/6ed84a04334338533c6773ce4b37d2179130df18/xray/test/test_conventions.py#L460-L493 Also, this needs a note in "What's New" in the docs. |
@@ -114,6 +114,9 @@ def open_dataset(filename_or_obj, group=None, decode_cf=True, | |||
used when reading data from netCDF files with the netcdf4 and h5netcdf | |||
engines to avoid issues with concurrent access when using dask's | |||
multithreaded backend. | |||
drop_variables: iterable, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would also be useful to support providing a single variable as a string? Someone is bound to try that. You can detect that with isinstance(drop_variables, basestring)
.
…ingle strings and it is embedded in the loop of decode_cf_variables. Added a test too.
I updated it following your advice, now the logic is in the loop of decode_cf_variables, and it supports single strings. I added the test and the documentation too. I successfully run nosetests. |
I am not able to reproduce the error found by Travis on a virtualenv with the same python version and packages, so I don't know why that specific run failed : ( |
I'll take a look -- something probably changed in one of our upstream dependencies. On Mon, Aug 17, 2015 at 9:06 AM, Joe Hamman [email protected]
|
@@ -807,6 +807,10 @@ def stackable(dim): | |||
|
|||
new_vars = OrderedDict() | |||
for k, v in iteritems(variables): | |||
if isinstance(drop_variables, basestring): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do this outside the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also convert drop_variables
to a set. That will keep things performant, even if the dataset has a very large number of variables.
'y': ('t', [5, 10, np.nan]) | ||
}) | ||
actual = conventions.decode_cf(original, drop_variables=("x",)) | ||
self.assertDatasetIdentical(expected, actual) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also test providing a single argument, drop_variables='x'
Move the "if string" check outside the loop. Add a new test to check this string case.
@@ -804,9 +804,13 @@ def stackable(dim): | |||
return True | |||
|
|||
coord_names = set() | |||
if isinstance(drop_variables, basestring): | |||
drop_variables = set([drop_variables,]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually -- we should always convert drop_variables to a set. I was thinking something like:
if isinstance(drop_variables, basestring):
drop_variables = [drop_variables]
elif drop_variables is None:
drop_variables = []
drop_variables = set(drop_variables)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would even let us skip drop_variables is None
below.
Pulled the "if ... is None" out of the loop and always convert drop_variables to a set.
OK, merging. Thanks @markelg ! |
Add a --drop-variables flag to xray.open_dataset to exclude certain variables
Related to issue #457. I implemented this flag following the instructions given by @shoyer in the issue thread. I have a decent amount of experience with python, but this is the first pull request I set up in GitHub, and I am a begginer with git (more used to svn). I was careful but please check that I did not mess up something ; )