Skip to content

Add a --drop-variables flag to xray.open_dataset to exclude certain variables #532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 19, 2015
Merged

Conversation

markelg
Copy link
Contributor

@markelg markelg commented Aug 14, 2015

Related to issue #457. I implemented this flag following the instructions given by @shoyer in the issue thread. I have a decent amount of experience with python, but this is the first pull request I set up in GitHub, and I am a begginer with git (more used to svn). I was careful but please check that I did not mess up something ; )

… variables from being decoded. It passes it to xray.decode_cf.
@@ -871,6 +874,9 @@ def decode_cf(obj, concat_characters=True, mask_and_scale=True,
vars, attrs, coord_names = decode_cf_variables(
vars, attrs, concat_characters, mask_and_scale, decode_times,
decode_coords)
if drop_variables is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you move this into decode_cf_variables instead? probably could put this logic in the loop that constructs new_vars

@shoyer
Copy link
Member

shoyer commented Aug 14, 2015

Looks pretty good to me. However, this does need tests to verify that it works -- see here for some examples: https://github.com/xray/xray/blob/6ed84a04334338533c6773ce4b37d2179130df18/xray/test/test_conventions.py#L460-L493

Also, this needs a note in "What's New" in the docs.

@@ -114,6 +114,9 @@ def open_dataset(filename_or_obj, group=None, decode_cf=True,
used when reading data from netCDF files with the netcdf4 and h5netcdf
engines to avoid issues with concurrent access when using dask's
multithreaded backend.
drop_variables: iterable, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would also be useful to support providing a single variable as a string? Someone is bound to try that. You can detect that with isinstance(drop_variables, basestring).

markelg and others added 2 commits August 17, 2015 15:02
@markelg
Copy link
Contributor Author

markelg commented Aug 17, 2015

I updated it following your advice, now the logic is in the loop of decode_cf_variables, and it supports single strings. I added the test and the documentation too. I successfully run nosetests.

@markelg
Copy link
Contributor Author

markelg commented Aug 17, 2015

I am not able to reproduce the error found by Travis on a virtualenv with the same python version and packages, so I don't know why that specific run failed : (

@jhamman
Copy link
Member

jhamman commented Aug 17, 2015

@markelg - The failing test you have is failing for the same reason as #529. I don't think it is related to your PR. @shoyer - Thoughts on what's going on here?

@shoyer
Copy link
Member

shoyer commented Aug 17, 2015

I'll take a look -- something probably changed in one of our upstream dependencies.

On Mon, Aug 17, 2015 at 9:06 AM, Joe Hamman [email protected]
wrote:

@markelg - The failing test you have is failing for the same reason as #529. I don't think it is related to your PR. @shoyer - Thoughts on what's going on here?

Reply to this email directly or view it on GitHub:
#532 (comment)

@@ -807,6 +807,10 @@ def stackable(dim):

new_vars = OrderedDict()
for k, v in iteritems(variables):
if isinstance(drop_variables, basestring):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do this outside the loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also convert drop_variables to a set. That will keep things performant, even if the dataset has a very large number of variables.

@shoyer
Copy link
Member

shoyer commented Aug 17, 2015

@markelg the test failure was unrelated (see #535 for details). If you merge or rebase on the latest master your tests should be passing.

'y': ('t', [5, 10, np.nan])
})
actual = conventions.decode_cf(original, drop_variables=("x",))
self.assertDatasetIdentical(expected, actual)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also test providing a single argument, drop_variables='x'

markelg and others added 2 commits August 18, 2015 10:12
Move the "if string" check outside the loop. Add a new test to
check this string case.
@@ -804,9 +804,13 @@ def stackable(dim):
return True

coord_names = set()
if isinstance(drop_variables, basestring):
drop_variables = set([drop_variables,])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually -- we should always convert drop_variables to a set. I was thinking something like:

if isinstance(drop_variables, basestring):
    drop_variables = [drop_variables]
elif drop_variables is None:
    drop_variables = []
drop_variables = set(drop_variables)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would even let us skip drop_variables is None below.

Pulled the "if ... is None" out of the loop and always convert drop_variables to a set.
@shoyer
Copy link
Member

shoyer commented Aug 19, 2015

OK, merging. Thanks @markelg !

shoyer added a commit that referenced this pull request Aug 19, 2015
Add a --drop-variables flag to xray.open_dataset to exclude certain variables
@shoyer shoyer merged commit 32365b3 into pydata:master Aug 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants