Speed up data loading #24

JackKelly · 2021-05-14T16:40:32Z

Maybe it's slow because we're very close to running out of mem (Why does data loading code use so much RAM? #22)? Try more mem on the VM?
More workers?

JackKelly · 2021-05-14T17:28:50Z

the next code (as of the evening of Fri 2021-05-14), can get ~ 50 it/s and only 24 G of RAM usage without NWP loading (with 4 workers).

JackKelly · 2021-05-14T18:13:03Z

NWPDataInMem.get_sample takes about 70 ms per sample. So with 8 samples per batch, it takes over half a second. That's probably the issue.

The interpolation (even linear) takes a while. Replacing the linear interpolation for ffill decreases the runtime of NWPDataInMem.get_sample from 70 ms to 15 ms. And increases the training speed from about 5 it/s to 15 it/s.

JackKelly · 2021-05-14T18:19:13Z

hmmm, maybe the issue is that get_nwp_example resamples the entire NWP field (a big image!) Some options to speed it up:

resample to 5-minutely in NWPDataLoader.load_single_chunk()
Only resample data that we need. e.g. maybe the NWPDataInMem.get_sample() would return hourly data, from start.floor('h') to end.ceil('h') and then it'd be up to the Transform to resample, after selecting what we need. I like this idea.

…Ps are still a little slow though (#24). About to try resampling 'step' in load_single_chunk()

JackKelly · 2021-05-14T18:46:40Z

I've implemented option 2, and it's helped a lot! NWPDataInMem.get_sample() now takes only 7.26 ms, and the system trains 25 it/sec, with GPU usage hovering around 15%.

JackKelly · 2021-05-14T18:50:05Z

More things to try:

Limit the spatial extent of the satellite imagery. DONE: Reduces size of nwp_in_mem to 14 MB (from 37 MB), and reduces runtime of get_sample to 5.78 ms (from 7.26 ms). Doesn't seem to speed up training much, or reduce mem during training much (with 5 workers, uses 53 GB RAM, and does about 20 it/s).
run get_sample() from the 3 AsyncDataLoaders in parallel. Try both threads and processes. Thoughts: Can't spawn child processes from daemonic worker processes. And not sure multiple threads will help because get_sample() is CPU-bound
a VM with more RAM, and then add more workers. (10 workers, 8-bit NWP, 32-bit PV uses 78 GB, and gets about 30 it/s, GPU usage of max 22%. 12 workers = 33 it/s, 99.5 GB RAM)
use minimal data type for NWP (uint8 for temperature) (DONE: reduces size of nwp_in_mem to 3.6 MB (from 14 MB) and reduces runtime of nwp_in_mem.get_sample() down to 4.6 ms, down from 5.78 ms. Uses about 44 GB RAM during training with 5 workers.)
Try again without NWP data, to see the memory usage and the training speed (it/s) and GPU usage. DONE. Without NWP, and with 12 workers, uses 71 GB RAM. Achieves 71 it/s and max GPU utilisation of 40%.
Is the PV data using lots of memory? If so, use minimal data type for PV? Share data between processes?!
try loading a complete batch at once
try using different processes for each data source: Can't spawn child processes from daemonic worker processes!

JackKelly · 2021-05-17T11:15:26Z

So, we know that including NWP data slows training down by a factor of more the 2x.

get_sample takes 4.3ms for NWP; and takes 1.13ms for sat data. So maybe need to speed up get_sample?

JackKelly · 2021-05-17T11:30:12Z

Profiling each line in get_nwp_example:

0.179 ms: date_range
1.686 ms: nwp.sel(init_time=target_times_hourly, method=ffill)
0.157 ms: init_time_future
0.043 ms: init_times[target_times_hourly > t0_hourly]
0.216 ms: steps = target_times_hourly - init_times
0.360 ms: init_time_indexer
0.103 ms: step_indexer
1.526 ms: nwp.sel(init_time=init_time_indexer, step=step_indexer)
CPU times: user 7.57 ms, sys: 0 ns, total: 7.57 ms
Wall time: 6.46 ms

JackKelly · 2021-05-17T12:05:57Z

oooh... looks like it's possible to significantly speed up the selection based on 'step' by first transposing so that 'step' is the first dimension. This gets the runtime down to 1.73 ms if always using the first init_time. Need to see if this speed up holds when using multiple init times based on t0.

JackKelly · 2021-05-17T14:22:37Z

Nope, doesn't look like transposing gives us the same performance increase when selecting multiple init times.

But, better news: I noticed that, when using NWPs, the code is pretty constantly loading from disk when min_n_samples_per_disk_load = 1000 and max_n_samples_per_disk_load = 2000. Increasing these to 4,000 and 8,000, respectively, gets us up to 50 it/s after 30,000 iterations (yay!) with NWPs, and 12 workers.

To really speed things up, I think we perhaps need to re-create the NWP Zarr, so the data is stored more efficiently on disk (#26).

JackKelly · 2021-05-17T14:42:28Z

Swapping back to the 'old', more thorough way of getting NWPs, gives us 47.8 it/s

JackKelly · 2021-05-17T14:56:39Z

Can't launch sub-processes from the worker processes: daemonic child processes aren't allowed to have children :)

JackKelly added enhancement New feature or request data Data processing, loading, or analysis labels May 14, 2021

JackKelly added a commit that referenced this issue May 14, 2021

Newly factored data loading code (#4, #18) is working pretty well! NW…

588cdc2

…Ps are still a little slow though (#24). About to try resampling 'step' in load_single_chunk()

JackKelly mentioned this issue May 14, 2021

Why does data loading code use so much RAM? #22

Closed

2 tasks

JackKelly added a commit that referenced this issue May 14, 2021

Sped up data loading with NWPs to about 25 it/s #24

34cd822

JackKelly added a commit that referenced this issue May 17, 2021

Getting back to about 50 it/s, now with NWPs! #24

35bf8b8

JackKelly closed this as completed Jun 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Speed up data loading #24

Speed up data loading #24

JackKelly commented May 14, 2021 •

edited

Loading

JackKelly commented May 14, 2021 •

edited

Loading

Uh oh!

JackKelly commented May 14, 2021

Uh oh!

JackKelly commented May 14, 2021 •

edited

Loading

Uh oh!

JackKelly commented May 14, 2021

Uh oh!

JackKelly commented May 14, 2021 •

edited

Loading

Uh oh!

JackKelly commented May 17, 2021

Uh oh!

JackKelly commented May 17, 2021

Uh oh!

JackKelly commented May 17, 2021 •

edited

Loading

Uh oh!

JackKelly commented May 17, 2021 •

edited

Loading

Uh oh!

JackKelly commented May 17, 2021

Uh oh!

JackKelly commented May 17, 2021

Uh oh!

Uh oh!

Speed up data loading #24

Speed up data loading #24

Comments

JackKelly commented May 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JackKelly commented May 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackKelly commented May 14, 2021

Uh oh!

JackKelly commented May 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackKelly commented May 14, 2021

Uh oh!

JackKelly commented May 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackKelly commented May 17, 2021

Uh oh!

JackKelly commented May 17, 2021

Uh oh!

JackKelly commented May 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackKelly commented May 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackKelly commented May 17, 2021

Uh oh!

JackKelly commented May 17, 2021

Uh oh!

JackKelly commented May 14, 2021 •

edited

Loading

JackKelly commented May 14, 2021 •

edited

Loading

JackKelly commented May 14, 2021 •

edited

Loading

JackKelly commented May 14, 2021 •

edited

Loading

JackKelly commented May 17, 2021 •

edited

Loading

JackKelly commented May 17, 2021 •

edited

Loading