Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Experiment will allowing xr.open_mfdataset to use dask for NWPs and Satellite to speed up loading #456

Closed
Tracked by #341
JackKelly opened this issue Nov 18, 2021 · 4 comments · Fixed by #459
Closed
Tracked by #341
Assignees
Labels
enhancement New feature or request

Comments

@JackKelly
Copy link
Member

No description provided.

@JackKelly JackKelly added the enhancement New feature or request label Nov 18, 2021
@JackKelly JackKelly self-assigned this Nov 18, 2021
@JackKelly JackKelly moved this to Todo in Nowcasting Nov 18, 2021
@JackKelly
Copy link
Member Author

JackKelly commented Nov 18, 2021

At the moment, prepare_ml_data.py takes about 65 seconds per HRVSatellite batch, which is way too slow...

The code currently calls xr.open_mfdataset with chunks=None which I think disables dask.. I'll try other options...

I'll try using Dask now...

@JackKelly
Copy link
Member Author

JackKelly commented Nov 18, 2021

chunks={} doesn't help (from the xr docs: "chunks={} loads the dataset with dask using engine preferred chunks if exposed by the backend, otherwise with a single chunk for all arrays")

@JackKelly
Copy link
Member Author

JackKelly commented Nov 18, 2021

Woo! chunks='auto' is much faster! Now about 7 seconds per hrv satellite batch!

@JackKelly
Copy link
Member Author

For reference, I had previously (way back in issue #23) decided to use chunks=None... but maybe dask has improved since then??)

Repository owner moved this from Todo to Done in Nowcasting Nov 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant