Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Conversation

peterdudfield
Copy link
Contributor

@peterdudfield peterdudfield commented Sep 22, 2021

Pull Request

Description

Main:

  • Add validation class which iterates over all the batches and validates them, and also find the unique days in the entire dataset. Also added test for this.
  • Add FakeDataset used for testing
  • Add validations script

Extras:

  • add 2 methods and validation to config class
  • update nwp test data
  • update test configs to represent test data
  • extra functions for manipulating examples/batch

Fixes issue #138

How Has This Been Tested?

Unittests + extra unittest
ran the validation script for a few batches - but need to run this on remote server/gcp

  • No
  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@peterdudfield peterdudfield self-assigned this Sep 24, 2021
# Conflicts:
#	nowcasting_dataset/config/model.py
#	nowcasting_dataset/consts.py
#	nowcasting_dataset/dataset/datasets.py
Copy link
Member

@jacobbieker jacobbieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I like all the simplification done, and refactoring, looks good!

"WV_073",
)

DEFAULT_REQUIRED_KEYS = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@JackKelly JackKelly linked an issue Sep 25, 2021 that may be closed by this pull request
4 tasks
@peterdudfield
Copy link
Contributor Author

@JackKelly gona merge this, so that this PR can use some of this stuff. Please do add comments

@peterdudfield peterdudfield merged commit fff8dc4 into main Sep 27, 2021
@peterdudfield peterdudfield deleted the issue/138-validate-prepared-ml-data branch September 27, 2021 12:08
Copy link
Member

@JackKelly JackKelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Awesome work! (And sorry for taking so long to review this!)

Just one tiny comment (and it's not important!)

assert data[OBJECT_AT_CENTER] == "gsp"

if not batch:
# add an extract dimension so that its similar to batch data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two tiny typos (sorry for being such a pedant... and sorry if I've misunderstood!)

  • extract -> extra
  • its -> it's

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are definitely right! my bad

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataset Validation script
3 participants