Tweak the way DataSources are represented in the Configuration model #217

JackKelly · 2021-10-11T15:43:36Z

Detailed Description

We maybe want to make a few small changes to the Configuration model:

include a list of enabled_data_sources
maybe split the config up so each data source has its own section, so that entire section can be passed into the DataSource object. e.g. there would be an NWP section, with fields zarr_path, image_size, and channels.

Context

Splitting up the Configuration so each data source has its own section should mean that we don't have to write custom code to translate between a Configuration object and the constructor for each DataSource. Instead we just do something like SatDataSource(**config.sat_data)

The text was updated successfully, but these errors were encountered:

peterdudfield · 2021-10-18T08:24:15Z

perhaps the layout could be
config:

data_sources
-- satellite
-- nwp
-- e.t.c
other subfields like Git

JackKelly · 2021-10-18T15:28:27Z

Yeah, exactly! To flesh that out one more level deep:

data_sources:
- default_forecast_minutes
- default_history_minutes
- satellite
  - zarr_path
  - channels
  - forecast_minutes # Optionally override default_forecast_minutes for Satellite data
  - history_minutes # Optionally override default_history_minutes for Satellite data
  - image_size_pixels
- nwp
  - zarr_path
  - channels
  - forecast_minutes # Optionally override default_forecast_minutes for NWP data
  - history_minutes # Optionally override default_history_minutes for NWP data
  - image_size_pixels
- etc.
git
process
- seed
- batch_size
- local_temp_path
output_data
etc.

I'm not entirely sure about the default_forecast_minutes thing. Maybe it's conceptually simpler to just enter forecast_minutes and history_minutes separately for each data_source. The main aim is to allow us to have different lengths for different data sources (e.g. 4 hours of historical NWPs; and only 1 hour of historical satellite)

JackKelly · 2021-10-20T09:21:50Z

Also, it would be great if the field names in the config file exactly match the constructor arguments for each DataSource so, to create, say, a SatelliteDataSource, we can just do SatelliteDataSource(**config.data_sources.satellite)

Should we keep the field name zarr_path and change the DataSources to take a zarr_path argument? Or should we always use filename for all the DataSources? (Although PVDataSource requires two filenames: one for the PV timeseries data; the other for metadata.)

peterdudfield · 2021-10-20T09:31:29Z

Yea I like the lay out above, I'm not quite sure how to do the 'default_forecast_minutes' from a model higher up, but we could have a play.

In the idea of being verbose, I think keep it as zarr_path, and for PVDataSource, put both the filenames in>

We can add a validator to make sure the filename or zarr_path has the right suffix

JackKelly · 2021-10-20T09:36:49Z

I'm not quite sure how to do the 'default_forecast_minutes' from a model higher up

Yeah, I wonder if we should keep things simple and users just have to put the forecast_minutes into each DataSource?

BTW, are you happy to have a shot at this issue? Absolutely no worries either way - I'm happy to if you've got too much other stuff on this week!

peterdudfield · 2021-10-20T09:38:07Z

Yea - i'll give it a go

JackKelly · 2021-10-20T09:46:14Z

Cheers, thank you!

To give a bit more context... and don't necessarily worry about implementing this yet... but I guess the ultimate goal is that, in prepare_ml_data.py, we can have a single line like this:

data_sources = get_data_sources(config.data_sources)

And, get_data_sources() could look something like this:

def get_data_sources(config_for_all_data_sources) -> list[DataSource]:
    data_source_name_to_class = {'satellite': SatelliteDataSource, 'nwp': NWPDataSource, ...}
    data_sources = []
    for data_source_name, data_source_class in data_source_name_to_class.items():
        config_for_data_source = getattr(config_for_all_data_sources, data_source_name)
        data_source = data_source_class(**config_for_data_source)
        data_sources.append(data_source)
    return data_sources

peterdudfield · 2021-10-20T09:48:03Z

That would be very neat indeed

JackKelly added enhancement New feature or request refactoring labels Oct 11, 2021

JackKelly mentioned this issue Oct 11, 2021

"Big new design" for nowcasting_dataset #213

Closed

38 tasks

flowirtz added this to Nowcasting Oct 15, 2021

flowirtz moved this to Todo in Nowcasting Oct 15, 2021

peterdudfield self-assigned this Oct 20, 2021

JackKelly mentioned this issue Oct 20, 2021

Load .yaml from nowcasting_dataset openclimatefix/nowcasting_dataloader#18

Closed

This was referenced Oct 20, 2021

Issue/217 data source configs #255

Merged

Remove time_30 and time_5 #230

Closed

peterdudfield closed this as completed in #255 Oct 22, 2021

Repository owner moved this from Todo to Done in Nowcasting Oct 22, 2021

JackKelly mentioned this issue Oct 26, 2021

Align DataSource constructor args with config YAML field names #285

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tweak the way DataSources are represented in the Configuration model #217

Tweak the way DataSources are represented in the Configuration model #217

JackKelly commented Oct 11, 2021

peterdudfield commented Oct 18, 2021

JackKelly commented Oct 18, 2021 •

edited

Loading

JackKelly commented Oct 20, 2021

peterdudfield commented Oct 20, 2021

JackKelly commented Oct 20, 2021

peterdudfield commented Oct 20, 2021

JackKelly commented Oct 20, 2021

peterdudfield commented Oct 20, 2021

Tweak the way DataSources are represented in the Configuration model #217

Tweak the way DataSources are represented in the Configuration model #217

Comments

JackKelly commented Oct 11, 2021

Detailed Description

Context

peterdudfield commented Oct 18, 2021

JackKelly commented Oct 18, 2021 • edited Loading

JackKelly commented Oct 20, 2021

peterdudfield commented Oct 20, 2021

JackKelly commented Oct 20, 2021

peterdudfield commented Oct 20, 2021

JackKelly commented Oct 20, 2021

peterdudfield commented Oct 20, 2021

JackKelly commented Oct 18, 2021 •

edited

Loading