This repository contains a pipeline for aggregating ERA5 environmental exposures data to a 0.1 degree grid. The pipeline is designed to be run on FASRC. We developed
this pipeline using nbdev
, which means that we can create modules and scripts from notebooks.
Hence, all of the documentation for how the pipeline was developed and validated is
available in notes/index.ipynb
and the associated notebooks.
To review a PR on this repository, follow these steps:
-
Obtain an API key for the ERA5 datastore from here, and ask Tinashe for access to the Golden Lab
googledriver
API key -
Clone this repository to your workspace on FASRC
-
Create a conda environment with
conda create -n era5_sandbox python=3.10
and install all of the necessary dependencies for the package withpip install -e .
-
Run the
core
module to test your API key and setup the data directory structure
python src/era5_sandbox/core.py
-
Symlink your local data directory to the original work
ln -s [YOUR WORKING DIRECTORY]/data /n/dominici_lab/lab/data_processing/csph-era5_sandbox/data
-
Dry run by removing a file from data
snakemake --dry-run
-
Run the pipeline
sbatch snakemake.sbatch