Skip to content

Fix loading hdf5 files with new versions of pandas #514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 11, 2019

Conversation

nickponvert
Copy link
Contributor

@nickponvert nickponvert commented Apr 11, 2019

This PR addresses issue #513 by changing all of the calls to pandas.read_hdf() to no longer pass 'format' as a kwarg. We used to do this, and the passed kwarg was ignored by HDFStore.open because it wasn't a defined keyword arg for that method, but no alarms were raised.

See: pandas-dev/pandas#13291

In newer versions of pandas there is an exception raised if you try to pass 'format' to the read_hdf() method. Removing this kwarg:

  1. Won't change the output of read_hdf, as the kwarg was being ignored before, and
  2. Allows me to run Marina's old notebooks with VBA analysis code and pandas==0.24.2

TODO:

  • Is feature/production_analysis the right target branch for this PR?

Nick Ponvert and others added 2 commits April 10, 2019 13:32
I think I have found all of the places where we pass the format kwarg
to pandas.read_hdf(), which is not allowed anymore by new versions
of pandas. This also removes commented-out lines that I had left
in an earlier commit.
@matchings
Copy link
Collaborator

@nickponvert do both of the notebooks run with the newer version of pandas? specifically, 1) the notebook that loads the dataset object, and 2) the notebook that reads the multi session dataframes, such as the one here: "\allen\programs\braintv\workgroups\nc-ophys\visual_behavior\visual_behavior_production_analysis\multi_session_summary_dfs\mean_flashes_image_name_engaged_df.h5"

@nickponvert
Copy link
Contributor Author

nickponvert commented Apr 11, 2019

Both of the notebooks Yazan sent us work after this fix. One of them uses visual_behavior_ophys_dataset.VisualBehaviorOphysDataset to load an experiment, and the other loads a summary df called 'mean_trials_change_image_name_trial_type_df.h5'. I can try with the one you linked.

Edit: I had to change a few lines in the second notebook that were calling pandas.read_hdf() with the format kwarg

@matchings
Copy link
Collaborator

Ok great, as long as those work, I am good with this change.

@jeromelecoq jeromelecoq marked this pull request as ready for review April 11, 2019 16:43
@nickponvert
Copy link
Contributor Author

Circle CI tests are failing. test-27 and test-36 are suffering from #515, and the linting errors related to the two files changed in this PR aren't related to these changes:

visual_behavior/ophys/response_analysis/response_analysis.py:45:40: E231 missing whitespace after ','
visual_behavior/ophys/response_analysis/response_analysis.py:103:22: E114 indentation is not a multiple of four (comment)
visual_behavior/ophys/response_analysis/response_analysis.py:103:22: E116 unexpected indentation (comment)
visual_behavior/ophys/response_analysis/response_analysis.py:103:22: E265 block comment should start with '# '
visual_behavior/ophys/response_analysis/response_analysis.py:106:9: E303 too many blank lines (2)
visual_behavior/ophys/response_analysis/response_analysis.py:108:21: E116 unexpected indentation (comment)
visual_behavior/ophys/response_analysis/response_analysis.py:108:21: E265 block comment should start with '# '
visual_behavior/ophys/response_analysis/response_analysis.py:126:13: F401 'h5py' imported but unused
visual_behavior/ophys/response_analysis/response_analysis.py:241:13: F401 'h5py' imported but unused
visual_behavior/ophys/response_analysis/response_analysis.py:271:63: E225 missing whitespace around operator
visual_behavior/ophys/response_analysis/response_analysis.py:328:13: F401 'h5py' imported but unused
visual_behavior/ophys/response_analysis/response_analysis.py:348:5: E303 too many blank lines (2)

visual_behavior/ophys/response_analysis/response_analysis.py:45:40: E231 missing whitespace after ','
visual_behavior/ophys/response_analysis/response_analysis.py:103:22: E114 indentation is not a multiple of four (comment)
visual_behavior/ophys/response_analysis/response_analysis.py:103:22: E116 unexpected indentation (comment)
visual_behavior/ophys/response_analysis/response_analysis.py:103:22: E265 block comment should start with '# '
visual_behavior/ophys/response_analysis/response_analysis.py:106:9: E303 too many blank lines (2)
visual_behavior/ophys/response_analysis/response_analysis.py:108:21: E116 unexpected indentation (comment)
visual_behavior/ophys/response_analysis/response_analysis.py:108:21: E265 block comment should start with '# '
visual_behavior/ophys/response_analysis/response_analysis.py:126:13: F401 'h5py' imported but unused
visual_behavior/ophys/response_analysis/response_analysis.py:241:13: F401 'h5py' imported but unused
visual_behavior/ophys/response_analysis/response_analysis.py:271:63: E225 missing whitespace around operator
visual_behavior/ophys/response_analysis/response_analysis.py:328:13: F401 'h5py' imported but unused
visual_behavior/ophys/response_analysis/response_analysis.py:348:5: E303 too many blank lines (2)

@matchings
Copy link
Collaborator

Those linting issues are probably my fault, I will try to fix them now.

@nickponvert nickponvert merged commit 3df0ed0 into feature/production_analysis Apr 11, 2019
nickponvert added a commit that referenced this pull request Apr 11, 2019
This addresses issue #513 and makes PR #514 unnecessary. For future reference, here's the explanation for the #513 fix: 

This PR addresses issue #513 by changing all of the calls to pandas.read_hdf() to no longer pass 'format' as a kwarg. We used to do this, and the passed kwarg was ignored by HDFStore.open because it wasn't a defined keyword arg for that method, but no alarms were raised.

See: pandas-dev/pandas#13291

In newer versions of pandas there is an exception raised if you try to pass 'format' to the read_hdf() method. Removing this kwarg:

- Won't change the output of read_hdf, as the kwarg was being ignored before, and
- Allows me to run Marina's old notebooks with VBA analysis code and pandas==0.24.2
@nickponvert nickponvert mentioned this pull request Apr 11, 2019
@nickponvert nickponvert deleted the fix/hdfloading branch April 11, 2019 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants