Skip to content

Flash loader updated #118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Jun 16, 2023
Merged

Flash loader updated #118

merged 38 commits into from
Jun 16, 2023

Conversation

zain-sohail
Copy link
Member

No description provided.

@zain-sohail zain-sohail requested review from steinnymir and rettigl May 1, 2023 23:10
@zain-sohail
Copy link
Member Author

I was suggested to not include too many files hence I didn't change the poetry, requirements and conda files. The loader depends on the joblib library so please install it for reviewing (this is why the check fails)

@rettigl
Copy link
Member

rettigl commented May 4, 2023

Certainly all dependencies should be added. Linting and tests still fail right now (see logs for details)

@zain-sohail
Copy link
Member Author

Certainly all dependencies should be added. Linting and tests still fail right now (see logs for details)

Yes. There are some differences in base loader and flash. These are not caught locally by pylint as we are having multiple linters on actions. I will address them.

@zain-sohail
Copy link
Member Author

Certainly all dependencies should be added. Linting and tests still fail right now (see logs for details)

The current tests for loaders fail because flash loader throws errors when the config file is not proper. These are caught in the general loader tests since it doesn't pass a config and hence config is just an empty dict.

I was trying to account for that by either adding the config file for flash or ignoring the flash loader from tests. Should be not make the tests so they are better generalized? Because I can't seem to have a workaround this.

I also find that the pre commit hooks specified only fix/catch some linting issues compared to the github action we have since the action uses multiple linters. I don't see that as feasible considering you can't easily check those lint errors on your local machine. Should we stick to one?

@rettigl
Copy link
Member

rettigl commented May 5, 2023

You can run pylint and mypy locally by executing the commands in the pipeline script. Make sure to have to newest versions of the packages installed, however. Maybe we can also include them in precommit, but i don't know how.
I can have a look at the thing with config files etc.

@rettigl
Copy link
Member

rettigl commented May 5, 2023

I cannot get your loader to work. Loading the file always fails with:
H5ParsingError: /FL1 Caused by: /FL1/Beamlines Caused by: /FL1/Beamlines/PG Caused by: /FL1/Beamlines/PG/Monochromator Caused by: /FL1/Beamlines/PG/Monochromator/monochromator photon energy Caused by: /FL1/Beamlines/PG/Monochromator/monochromator photon energy/index Caused by: 'Dataset' object has no attribute 'keys'

Your iteration in parse_h5_keys() does not seem to have a proper breaking condition, and tries to read off keys of the final branch (dataset), which fails. This is whith h5py 3.7 and 3.8.

I do, however, have solutions for the tests, I think.

Copy link
Member

@rettigl rettigl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could only review the very skeleton, and no functionality yet, because the loader does not work for me (see comments).

@rettigl
Copy link
Member

rettigl commented May 5, 2023

flash tests still fail, because the flash loader does not work on files and folders, and because I did not include the test data yet.

@rettigl
Copy link
Member

rettigl commented May 13, 2023

This needs merging or rebasing of main now, which will require some manual tweeking, I believe, as I have changed some details on the loader interface when adding the metadata support in the leaders. I can take care of this, once the other things are addressed (I would not do it now to not outdate my review).

- using generic exception handling now
- utils moved to flash module
- attributes removed
- moved initliaze paths to utils (new name)
- added option for files and runs
- metadata support (basic)
- other instruments supported like wespe
- paths like online-1 now supported
- fix kwds in processor
@rettigl
Copy link
Member

rettigl commented May 29, 2023

Great, now I can run the code. I will follow up with some more in-depth review at some later stage.

Copy link
Member

@rettigl rettigl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a closer look now and tried to understand the code. In general I still don't quite understand all the multi index magic and stuff, I think a very detailed documentation seems necessary here. Also, the outcome seems to me an overly detailed dataframe with a lot of unnecessary information, but that might be up to the user to see it differently. Not sure there is a practical use case for having extractor current per electron... Anyways not as default.
In general, lot of places still miss argument description in doc strings, and typing is not consistently defined. Also for me really slow (2 min for 30 MByte parquet conversion).
Apart from that, looks to be on a good track.

@zain-sohail
Copy link
Member Author

I had a closer look now and tried to understand the code. In general I still don't quite understand all the multi index magic and stuff, I think a very detailed documentation seems necessary here. Also, the outcome seems to me an overly detailed dataframe with a lot of unnecessary information, but that might be up to the user to see it differently. Not sure there is a practical use case for having extractor current per electron... Anyways not as default. In general, lot of places still miss argument description in doc strings, and typing is not consistently defined. Also for me really slow (2 min for 30 MByte parquet conversion). Apart from that, looks to be on a good track.

The config file I provided is not default. There is actually no default and is up to the user. You can easily remove the channels you find undesirable. Of course I can discuss with dima to see if we come up with a default later.
Regarding multiindex stuff, flash operates with a train id and pulse id and for pes, we want electron count so multiindex just was the best way to do it. It's possible to provide a detailed documentation other than the comments in the code but I feel it is pointless from user perspective as at the end we just provide a dataframe with such columns and no multiindex, and also pointless for developers unless they really want to develop a similar framework.
docstrings and typing, I will add. There's just a lot to cover in this PR so I missed some. I'd ideally like to merge it and then make further improvements in a seperate PR.

@rettigl
Copy link
Member

rettigl commented Jun 1, 2023

I'd ideally like to merge it and then make further improvements in a seperate PR.

I completely appreciate that. Then let's finalize linting, docstrings and tests and merge soon.

Copy link
Member

@steinnymir steinnymir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is overall not pretty with these exploded lines forced by the linting, but i guess that is not a problem of this PR.

I added a few comments, and suggest to merge soon, improve later for the performance issues.

Copy link
Member

@rettigl rettigl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, LGTM now. A few things are still in open discussion threads, but can also be future issues, or are of minor importance. Importantly, the thing should be tested with various datasets at flash.

@zain-sohail
Copy link
Member Author

There are plenty of stuff to be done but that can be done in another PR, so I will merge this now.

@coveralls
Copy link
Collaborator

coveralls commented Jun 16, 2023

Pull Request Test Coverage Report for Build 5288984075

  • 250 of 337 (74.18%) changed or added relevant lines in 9 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.2%) to 75.436%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sed/loader/generic/loader.py 4 5 80.0%
sed/loader/mpes/loader.py 2 3 66.67%
sed/loader/utils.py 10 12 83.33%
sed/core/processor.py 4 7 57.14%
sed/loader/base/loader.py 17 25 68.0%
sed/loader/flash/metadata.py 9 32 28.13%
sed/loader/flash/loader.py 166 215 77.21%
Files with Coverage Reduction New Missed Lines %
sed/core/processor.py 1 39.94%
Totals Coverage Status
Change from base Build 5090414043: -0.2%
Covered Lines: 2899
Relevant Lines: 3843

💛 - Coveralls

@zain-sohail zain-sohail merged commit 9b4e506 into main Jun 16, 2023
@zain-sohail zain-sohail deleted the flash-new branch June 16, 2023 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants