C-DM-lagr

This is the codebase for Stochastic Reconstruction of Gappy Lagrangian Turbulent Signals by Conditional Generative Diffusion Models.

This repository is based on SmartTURB/diffusion-lagr, with added functionality to perform gappy Lagrangian turbulent signals reconstruction conditioned on the measurements outside the gap. Specifically, three additional modules have been implemented:

continuous_diffusion: Enables diffusion models to condition on a continuous noise level rather than discrete timesteps. See WaveGrad for details.
palette_diffusion: Enables conditional diffusion models (C-DM) for image-to-image translation tasks. See Palette for details.
tfg_diffusion: Enables reconstruction with an unconditional diffusion model using a special case of training-free guidance (TFG), diffusion posterior sampling (DPS).

Installation

This codebase runs in a similar environment as Development Environment. Check env_setup.txt for installation details with required packages and dependencies.

Data Preparation

Dataset: 3D HIT tracers

Please refer to Preparing Data for download and usage details of the file Lagr_u3c_diffusion.h5. Use the two scripts in datasets/lagr/ to split the original dataset into 90% for training and 10% for testing for both the 1c and 3c cases.

Dataset: 2D Ocean Drifters

One can access the hourly drifter data from the NOAA Global Drifter Program here. We used version 2.01 and selected the file gdp-v2.01.nc.

To preprocess the data, including (1) dividing trajectories into non-overlapping 60-day segments and (2) removing segments with spurious points of high velocity or acceleration, run the script datasets/gdp1h/create-gdp1h_60d-datasets/create-gdp1h_60d-datasets.sh, which requires the clouddrift package (clouddrift.org). This will output two files: gdp1h_60d-diffusion.h5, containing the processed velocity segments, and gdp1h_60d-pos0.h5, containing the initial positions of these segments. Both files are available on the INFN Open Access Repository at this link, and can be loaded as follows:

import h5py
import numpy as np

with h5py.File('gdp1h_60d-diffusion.h5', 'r') as h5f:
    rx0 = np.array(h5f.get('min'))
    rx1 = np.array(h5f.get('max'))
    v2c = (np.array(h5f.get('train'))+1)*(rx1-rx0)/2 + rx0

with h5py.File('gdp1h_60d-pos0.h5', 'r') as h5f:
    lon0 = np.array(h5f.get('lon0'))
    lat0 = np.array(h5f.get('lat0'))

The v2c variable has a shape of (115450, 1440, 2), representing 115,450 segments, each with 1,440 time instants (hours) and 2 velocity components. These velocities are min-max normalized with rx0=-3 and rx1=3 in the h5 file as the dataset train. lon0 and lat0 represent the initial longitude and latitude of the segments, each with a shape of (115450).

To obtain the positions for a specific segment (e.g., idx), one can use the position_from_velocity function from the clouddrift package as follows:

from clouddrift.kinematics import position_from_velocity

ve_idx, vn_idx = v2c[idx, :, 0], v2c[idx, :, 1]
time = np.arange(1440) * 3600  # units: seconds
lon_idx, lat_idx = position_from_velocity(ve_idx, vn_idx, time, lon0[idx], lat0[idx])

C-DM Training

Please refer to the parent repository’s Training section for detailed information, including hyperparameter configuration. The most important additional flag in this case is --mask_mode, which has the following options:

center1d<lg>: Specifies a central gap of size <lg>.
right1d<lg>: Specifies a right-end gap of size <lg>.
interp1d<scale_factor>: Specifies a sample point every <scale_factor> points for interpolation cases.

See the function get_mask in palette_diffusion/palette_datasets.py for customizing the reconstruction scenario.

For Lagrangian turbulence reconstruction with a central gap of size $50\tau_\eta$, use the following flags:

DATA_FLAGS="--mask_mode center1d500 --dataset_path datasets/lagr/Lagr_u3c_diffusion_splits.h5 --dataset_name train"
MODEL_FLAGS="--dims 1 --image_size 2000 --in_channels 3 --num_channels 128 --num_res_blocks 3 --attention_resolutions 250,125 --channel_mult 1,1,2,3,4"
DIFFUSION_FLAGS="--diffusion_steps 800 --noise_schedule tanh6,1"
TRAIN_FLAGS="--lr 1e-4 --batch_size 64 --total_steps 250000"

For ocean drifter observation reconstruction with a central gap of 360 hours, use the following flags:

DATA_FLAGS="--mask_mode center1d360 --dataset_path datasets/gdp1h/gdp1h_v2c_diffusion_splits.h5 --dataset_name train"
MODEL_FLAGS="--dims 1 --image_size 1440 --in_channels 2 --num_channels 128 --num_res_blocks 3 --attention_resolutions 180,90 --channel_mult 1,1,2,3,4"
DIFFUSION_FLAGS="--diffusion_steps 800 --noise_schedule tanh6,1"
TRAIN_FLAGS="--lr 1e-4 --batch_size 64 --total_steps 250000"

Use scripts/palette_train.py to train the conditional diffusion model:

mpiexec -n $NUM_GPUS python scripts/palette_train.py $DATA_FLAGS $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS

For most gap configurations (specified by --mask_mode) in this work, training was run on 4 A100 GPUs for 250,000 iterations, as set by the --total_steps flag, typically completing within approximately 24 hours.

Reconstruction with C-DM

Please refer to the parent repository’s Sampling section for detailed information. The only additional option here is --seed, which sets the random seed for reconstruction.

For Lagrangian turbulence reconstruction with a central gap of size $50\tau_\eta$, use the following flags:

DATA_FLAGS="--mask_mode center1d500 --dataset_path datasets/lagr/Lagr_u3c_diffusion_splits.h5 --dataset_name test"
MODEL_FLAGS="--dims 1 --image_size 2000 --in_channels 3 --num_channels 128 --num_res_blocks 3 --attention_resolutions 250,125 --channel_mult 1,1,2,3,4"
DIFFUSION_FLAGS="--diffusion_steps 800 --noise_schedule tanh6,1"
SAMPLE_FLAGS="--num_samples 32768 --batch_size 64 --model_path /path/to/model.pt --seed 0"

For ocean drifter observation reconstruction with a central gap of 360 hours, use the following flags:

DATA_FLAGS="--mask_mode center1d360 --dataset_path datasets/gdp1h/gdp1h_v2c_diffusion_splits.h5 --dataset_name test"
MODEL_FLAGS="--dims 1 --image_size 1440 --in_channels 2 --num_channels 128 --num_res_blocks 3 --attention_resolutions 180,90 --channel_mult 1,1,2,3,4"
DIFFUSION_FLAGS="--diffusion_steps 800 --noise_schedule tanh6,1"
SAMPLE_FLAGS="--num_samples 11545 --batch_size 64 --model_path /path/to/model.pt --seed 0"

Use scripts/palette_sample.py to reconstruct the test data:

python scripts/palette_sample.py $DATA_FLAGS $MODEL_FLAGS $DIFFUSION_FLAGS $SAMPLE_FLAGS

Reconstruction with DPS

Please refer to the parent repository for detailed instructions on training an unconditional diffusion model. To ensure compatibility with DPS reconstruction, --use_continuous_diffusion=True must be set during training.

For 3D HIT tracers, we use the following settings to train an unconditional diffusion model:

DATA_FLAGS="--dataset_path datasets/lagr/Lagr_u3c_diffusion_splits.h5 --dataset_name train"
MODEL_FLAGS="--dims 1 --image_size 2000 --in_channels 3 --num_channels 128 --num_res_blocks 3 --attention_resolutions 250,125 --channel_mult 1,1,2,3,4"
DIFFUSION_FLAGS="--use_continuous_diffusion True --diffusion_steps 800 --noise_schedule tanh6,1"
TRAIN_FLAGS="--lr 1e-4 --batch_size 64 --total_steps 250000"

mpiexec -n 4 python scripts/turb_train.py $DATA_FLAGS $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS

The trained checkpoint is available for download here.

The following settings can be used to perform DPS reconstruction:

DATA_FLAGS="--mask_mode center1d500 --dataset_path /mnt/petaStor/li/Job/TFG-DM-lagr/datasets/lagr/Lagr_u3c_diffusion_splits_noisy_scale1e-4.h5 --dataset_name test"
MODEL_FLAGS="--dims 1 --image_size 2000 --in_channels 3 --num_channels 128 --num_res_blocks 3 --attention_resolutions 250,125 --channel_mult 1,1,2,3,4"
DIFFUSION_FLAGS="--diffusion_steps 800 --noise_schedule tanh6,1"
SAMPLE_FLAGS="--num_samples 32768 --batch_size 64 --model_path /mnt/petaStor/li/Job/TFG-DM-lagr/experiments/lagr_u3c_tfg-IS2000-NC128-NRB3-DS800-NStanh6_1-LR1e-4-BS256-train/ema_0.9999_250000.pt --seed 0"
GUIDANCE_FLAGS="--guidance_name dps --guidance_strength 64.0"

python scripts/tfg_sample.py $DATA_FLAGS $MODEL_FLAGS $DIFFUSION_FLAGS $SAMPLE_FLAGS $GUIDANCE_FLAGS

Compared to Reconstruction with C-DM, the only additional flags required for DPS are --guidance_name and --guidance_strength, which specify the TFG strategy name (dps) and the guidance strength, respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
continuous_diffusion		continuous_diffusion
datasets		datasets
guided_diffusion		guided_diffusion
palette_diffusion		palette_diffusion
scripts		scripts
tfg_diffusion		tfg_diffusion
LICENSE		LICENSE
README.md		README.md
env_setup.txt		env_setup.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

C-DM-lagr

Installation

Data Preparation

Dataset: 3D HIT tracers

Dataset: 2D Ocean Drifters

C-DM Training

Reconstruction with C-DM

Reconstruction with DPS

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

SmartTURB/C-DM-lagr

Folders and files

Latest commit

History

Repository files navigation

C-DM-lagr

Installation

Data Preparation

Dataset: 3D HIT tracers

Dataset: 2D Ocean Drifters

C-DM Training

Reconstruction with C-DM

Reconstruction with DPS

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages