You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 11, 2023. It is now read-only.
After the "big new design", prepare_ml_data.py takes about 12 seconds per satellite batch. That's too slow. But, fear not, there are plenty of ways to speed it up!
After prepare_ml_data.py finishes with sun, topographic, and GSP (which all zoom past), leonardo really isn't being pushed very hard when it's only doing satellite and nwp:
Possible Implementation
Write pre-prepared batches to leonardo's new 4 TB SSD
check that create batch is still using ThreadPoolExecutor to load examples in parallel
Use smaller dtypes for NWP and Satellite Zarrs (see Use smaller dtypes for saved data #61). Although satellite data is already 10-bit-per-channel, so reducing to 8-bit won't speed things up that much.
If needs be, bring back the idea of creating a batch by loading, say, 16 time slices off disk, and sampling 2 geographical regions of interest from each time slice to produce 32 examples per batch (i.e. halving the amount of data that needs to be loaded from disk per batch). This definitely speeds up loading but reduces randomness. This is how the code did it before "the big new redesign".... here's the commit where it was mostly removed: f896a5e#L103 in nwp_data_source.get_batch())
Detailed Description
After the "big new design",
prepare_ml_data.py
takes about 12 seconds per satellite batch. That's too slow. But, fear not, there are plenty of ways to speed it up!After
prepare_ml_data.py
finishes with sun, topographic, and GSP (which all zoom past),leonardo
really isn't being pushed very hard when it's only doing satellite and nwp:Possible Implementation
leonardo
's new 4 TB SSDManager
usemultiprocessing.Pool
notProcessPoolExecutor
#325dataset.load()
_after_ joining examples into batch #475nwp_data_source.get_batch()
)GSPDataSource.get_locations()
#305Manager.sample_spatial_and_temporal_locations_for_examples()
by splittingshuffled_t0_datetimes
across multliple processes #304 (but only bother with this if 305 is not sufficient)The text was updated successfully, but these errors were encountered: