Skip to content

NSAPH-Data-Processing/era5_sandbox

 
 

Repository files navigation

ERA5 Exposure Aggregation Pipeline

This repository contains a pipeline for aggregating ERA5 environmental exposures data to a 0.1 degree grid. The pipeline is designed to be run on FASRC. We developed this pipeline using nbdev, which means that we can create modules and scripts from notebooks. Hence, all of the documentation for how the pipeline was developed and validated is available in notes/index.ipynb and the associated notebooks.

How to Review a PR

To review a PR on this repository, follow these steps:

  1. Obtain an API key for the ERA5 datastore from here, and ask Tinashe for access to the Golden Lab googledriver API key

  2. Clone this repository to your workspace on FASRC

  3. Create a conda environment with conda create -n era5_sandbox python=3.10 and install all of the necessary dependencies for the package with pip install -e .

  4. Run the core module to test your API key and setup the data directory structure

python src/era5_sandbox/core.py

  1. Symlink your local data directory to the original work ln -s [YOUR WORKING DIRECTORY]/data /n/dominici_lab/lab/data_processing/csph-era5_sandbox/data

  2. Dry run by removing a file from data snakemake --dry-run

  3. Run the pipeline sbatch snakemake.sbatch

About

Sandbox repository for era5 work

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.1%
  • Other 1.9%