Skip to content

Stochastic Reconstruction of Gappy Lagrangian Turbulent Signals by Conditional Generative Diffusion Models

License

Notifications You must be signed in to change notification settings

SmartTURB/C-DM-lagr

Repository files navigation

C-DM-lagr

This is the codebase for Stochastic Reconstruction of Gappy Lagrangian Turbulent Signals by Conditional Generative Diffusion Models.

This repository is based on SmartTURB/diffusion-lagr, with added functionality to perform gappy Lagrangian turbulent signals reconstruction conditioned on the measurements outside the gap. Specifically, three additional modules have been implemented:

  • continuous_diffusion: Enables diffusion models to condition on a continuous noise level rather than discrete timesteps. See WaveGrad for details.

  • palette_diffusion: Enables conditional diffusion models (C-DM) for image-to-image translation tasks. See Palette for details.

  • tfg_diffusion: Enables reconstruction with an unconditional diffusion model using a special case of training-free guidance (TFG), diffusion posterior sampling (DPS).

Installation

This codebase runs in a similar environment as Development Environment. Check env_setup.txt for installation details with required packages and dependencies.

Data Preparation

Dataset: 3D HIT tracers

Please refer to Preparing Data for download and usage details of the file Lagr_u3c_diffusion.h5. Use the two scripts in datasets/lagr/ to split the original dataset into 90% for training and 10% for testing for both the 1c and 3c cases.

Dataset: 2D Ocean Drifters

One can access the hourly drifter data from the NOAA Global Drifter Program here. We used version 2.01 and selected the file gdp-v2.01.nc.

To preprocess the data, including (1) dividing trajectories into non-overlapping 60-day segments and (2) removing segments with spurious points of high velocity or acceleration, run the script datasets/gdp1h/create-gdp1h_60d-datasets/create-gdp1h_60d-datasets.sh, which requires the clouddrift package (clouddrift.org). This will output two files: gdp1h_60d-diffusion.h5, containing the processed velocity segments, and gdp1h_60d-pos0.h5, containing the initial positions of these segments. Both files are available on the INFN Open Access Repository at this link, and can be loaded as follows:

import h5py
import numpy as np

with h5py.File('gdp1h_60d-diffusion.h5', 'r') as h5f:
    rx0 = np.array(h5f.get('min'))
    rx1 = np.array(h5f.get('max'))
    v2c = (np.array(h5f.get('train'))+1)*(rx1-rx0)/2 + rx0

with h5py.File('gdp1h_60d-pos0.h5', 'r') as h5f:
    lon0 = np.array(h5f.get('lon0'))
    lat0 = np.array(h5f.get('lat0'))

The v2c variable has a shape of (115450, 1440, 2), representing 115,450 segments, each with 1,440 time instants (hours) and 2 velocity components. These velocities are min-max normalized with rx0=-3 and rx1=3 in the h5 file as the dataset train. lon0 and lat0 represent the initial longitude and latitude of the segments, each with a shape of (115450).

To obtain the positions for a specific segment (e.g., idx), one can use the position_from_velocity function from the clouddrift package as follows:

from clouddrift.kinematics import position_from_velocity

ve_idx, vn_idx = v2c[idx, :, 0], v2c[idx, :, 1]
time = np.arange(1440) * 3600  # units: seconds
lon_idx, lat_idx = position_from_velocity(ve_idx, vn_idx, time, lon0[idx], lat0[idx])

C-DM Training

Please refer to the parent repository’s Training section for detailed information, including hyperparameter configuration. The most important additional flag in this case is --mask_mode, which has the following options:

  • center1d<lg>: Specifies a central gap of size <lg>.
  • right1d<lg>: Specifies a right-end gap of size <lg>.
  • interp1d<scale_factor>: Specifies a sample point every <scale_factor> points for interpolation cases.

See the function get_mask in palette_diffusion/palette_datasets.py for customizing the reconstruction scenario.

For Lagrangian turbulence reconstruction with a central gap of size $50\tau_\eta$, use the following flags:

DATA_FLAGS="--mask_mode center1d500 --dataset_path datasets/lagr/Lagr_u3c_diffusion_splits.h5 --dataset_name train"
MODEL_FLAGS="--dims 1 --image_size 2000 --in_channels 3 --num_channels 128 --num_res_blocks 3 --attention_resolutions 250,125 --channel_mult 1,1,2,3,4"
DIFFUSION_FLAGS="--diffusion_steps 800 --noise_schedule tanh6,1"
TRAIN_FLAGS="--lr 1e-4 --batch_size 64 --total_steps 250000"

For ocean drifter observation reconstruction with a central gap of 360 hours, use the following flags:

DATA_FLAGS="--mask_mode center1d360 --dataset_path datasets/gdp1h/gdp1h_v2c_diffusion_splits.h5 --dataset_name train"
MODEL_FLAGS="--dims 1 --image_size 1440 --in_channels 2 --num_channels 128 --num_res_blocks 3 --attention_resolutions 180,90 --channel_mult 1,1,2,3,4"
DIFFUSION_FLAGS="--diffusion_steps 800 --noise_schedule tanh6,1"
TRAIN_FLAGS="--lr 1e-4 --batch_size 64 --total_steps 250000"

Use scripts/palette_train.py to train the conditional diffusion model:

mpiexec -n $NUM_GPUS python scripts/palette_train.py $DATA_FLAGS $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS

For most gap configurations (specified by --mask_mode) in this work, training was run on 4 A100 GPUs for 250,000 iterations, as set by the --total_steps flag, typically completing within approximately 24 hours.

Reconstruction with C-DM

Please refer to the parent repository’s Sampling section for detailed information. The only additional option here is --seed, which sets the random seed for reconstruction.

For Lagrangian turbulence reconstruction with a central gap of size $50\tau_\eta$, use the following flags:

DATA_FLAGS="--mask_mode center1d500 --dataset_path datasets/lagr/Lagr_u3c_diffusion_splits.h5 --dataset_name test"
MODEL_FLAGS="--dims 1 --image_size 2000 --in_channels 3 --num_channels 128 --num_res_blocks 3 --attention_resolutions 250,125 --channel_mult 1,1,2,3,4"
DIFFUSION_FLAGS="--diffusion_steps 800 --noise_schedule tanh6,1"
SAMPLE_FLAGS="--num_samples 32768 --batch_size 64 --model_path /path/to/model.pt --seed 0"

For ocean drifter observation reconstruction with a central gap of 360 hours, use the following flags:

DATA_FLAGS="--mask_mode center1d360 --dataset_path datasets/gdp1h/gdp1h_v2c_diffusion_splits.h5 --dataset_name test"
MODEL_FLAGS="--dims 1 --image_size 1440 --in_channels 2 --num_channels 128 --num_res_blocks 3 --attention_resolutions 180,90 --channel_mult 1,1,2,3,4"
DIFFUSION_FLAGS="--diffusion_steps 800 --noise_schedule tanh6,1"
SAMPLE_FLAGS="--num_samples 11545 --batch_size 64 --model_path /path/to/model.pt --seed 0"

Use scripts/palette_sample.py to reconstruct the test data:

python scripts/palette_sample.py $DATA_FLAGS $MODEL_FLAGS $DIFFUSION_FLAGS $SAMPLE_FLAGS

Reconstruction with DPS

Please refer to the parent repository for detailed instructions on training an unconditional diffusion model. To ensure compatibility with DPS reconstruction, --use_continuous_diffusion=True must be set during training.

For 3D HIT tracers, we use the following settings to train an unconditional diffusion model:

DATA_FLAGS="--dataset_path datasets/lagr/Lagr_u3c_diffusion_splits.h5 --dataset_name train"
MODEL_FLAGS="--dims 1 --image_size 2000 --in_channels 3 --num_channels 128 --num_res_blocks 3 --attention_resolutions 250,125 --channel_mult 1,1,2,3,4"
DIFFUSION_FLAGS="--use_continuous_diffusion True --diffusion_steps 800 --noise_schedule tanh6,1"
TRAIN_FLAGS="--lr 1e-4 --batch_size 64 --total_steps 250000"

mpiexec -n 4 python scripts/turb_train.py $DATA_FLAGS $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS

The trained checkpoint is available for download here.

The following settings can be used to perform DPS reconstruction:

DATA_FLAGS="--mask_mode center1d500 --dataset_path /mnt/petaStor/li/Job/TFG-DM-lagr/datasets/lagr/Lagr_u3c_diffusion_splits_noisy_scale1e-4.h5 --dataset_name test"
MODEL_FLAGS="--dims 1 --image_size 2000 --in_channels 3 --num_channels 128 --num_res_blocks 3 --attention_resolutions 250,125 --channel_mult 1,1,2,3,4"
DIFFUSION_FLAGS="--diffusion_steps 800 --noise_schedule tanh6,1"
SAMPLE_FLAGS="--num_samples 32768 --batch_size 64 --model_path /mnt/petaStor/li/Job/TFG-DM-lagr/experiments/lagr_u3c_tfg-IS2000-NC128-NRB3-DS800-NStanh6_1-LR1e-4-BS256-train/ema_0.9999_250000.pt --seed 0"
GUIDANCE_FLAGS="--guidance_name dps --guidance_strength 64.0"

python scripts/tfg_sample.py $DATA_FLAGS $MODEL_FLAGS $DIFFUSION_FLAGS $SAMPLE_FLAGS $GUIDANCE_FLAGS

Compared to Reconstruction with C-DM, the only additional flags required for DPS are --guidance_name and --guidance_strength, which specify the TFG strategy name (dps) and the guidance strength, respectively.

About

Stochastic Reconstruction of Gappy Lagrangian Turbulent Signals by Conditional Generative Diffusion Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published