Skip to content

NVIDIA/bionemo-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BioNeMo Framework

Click here to deploy. Docs Build Test Status Latest Tag codecov

NVIDIA BioNeMo Framework is a comprehensive suite of programming tools, libraries, and models designed for computational drug discovery. It accelerates the most time-consuming and costly stages of building and adapting biomolecular AI models by providing domain-specific, optimized models and tooling that are easily integrated into GPU-based computational resources with state-of-the-art performance.

Note

A core use-case of the BioNeMo Framework is to help digital biology scientists accelerate and scale their model training onto a compute cluster. This repository contains 3 categories of modules for this use-case:

1. Models using fully-sharded-data-parallel (FSDP), which is possible with a number of different implementations including PyTorch’s FSDP2/FSDP1 and NVIDIA megatron-FSDP. Sharding a model with FSDP typically requires only a few lines of code changes. You can find models and ready-to-run recipes parallelized with megatron-FSDP and accelerated with NVIDIA TransformerEngine (TE) in bionemo-recipes.

(Click to expand) bionemo-recipes support matrix
Directory Description Support Status 5D Parallel Megatron-FSDP TE Sequence Packing FP8 Context Parallelism
models/
amplify
TE accelerated protein BERT, pushed to HuggingFace ✅ Active 🚧 WIP 🚧 WIP
models/
esm2
TE accelerated protein BERT, pushed to HuggingFace ✅ Active 🚧 WIP
models/
geneformer
TE accelerated single-cell BERT 🚧 WIP 🚧 WIP 🚧 WIP 🚧 WIP 🚧 WIP
recipes/
amplify_accelerate_te_fp8
Recipe for Amplify TE + HF Accelerate ☠️ EOL[1] 🚧 WIP
recipes/
esm2_accelerate_te
Recipe for ESM2 TE + HF Accelerate ✅ Active 🚧 WIP 🚧 WIP
recipes/
esm2_native_te
Recipe for ESM2 TE + native PyTorch ✅ Active 🚧 WIP
recipes/
esm2_native_te_mfsdp_thd
Recipe for ESM2 TE + megatron-FSDP + seq packing ☠️ EOL[1] 🚧 WIP
recipes/
geneformer_native_te_mfsdp_fp8
Recipe for Geneformer HF model 🚧 WIP 🚧 WIP
recipes/
vit
Recipe for Vision Transformer 🚧 WIP 🚧 WIP

[1]: End-of-life; to be merged with esm2_native_te recipe.

2. Models using explicit 5D parallelism (tensor parallel, pipeline parallel, context parallel, etc.), for which NVIDIA provides accelerated support with NeMo and Megatron-Core. 5D parallelism requires explicit modification of the model code to make it shardable along different dimensions. The models for this style of acceleration and parallelism can be found in the sub-packages directory. While it is possible to pip install the models, we strongly suggest using our Docker image that comes with NeMo and Megatron-Core pre-installed.

(Click to expand) sub-packages models support matrix
Directory Description Support 5D Parallel Megatron-FSDP TE Sequence Packing FP8 Context Parallel
bionemo-amplify 5D parallel model 🔧 Maintenance
bionemo-core Model Config/test data utils ✅ Active N/A N/A N/A
bionemo-esm2 5D parallel model ✅ Active
bionemo-evo2 5D parallel model ✅ Active
bionemo-example_model Example 5D parallel model 🔧 Maintenance
bionemo-fw Meta package to pull other packages ✅ Active N/A N/A N/A
bionemo-geneformer 5D parallel model 🔧 Maintenance
bionemo-llm 5D parallel base model (BioBert) ✅ Active
bionemo-testing Testing Utilities ✅ Active N/A N/A N/A N/A N/A

3. Tooling for dataloading and in-the-training-loop processing, which are lightweight and individually pip installable. These are also in the sub-packages directory adjacent to the 5D parallel models.

(Click to expand) sub-packages tooling support matrix
Directory Description Support 5D Parallel Megatron-FSDP TE Sequence Packing FP8 Context Parallel
bionemo-moco Molecular Co-design tools ✅ Active N/A N/A N/A N/A N/A
bionemo-noodles Python API to fast FASTA file I/O 🔧 Maintenance N/A N/A N/A N/A N/A
bionemo-scspeedtest Single Cell Dataloading benchmark tests ✅ Active N/A N/A N/A N/A N/A N/A
bionemo-size-aware-batching Memory consumption aware batching 🔧 Maintenance N/A N/A N/A N/A N/A N/A
bionemo-scdl Modular Single Cell Data Loader ✅ Active ✅ Compatible N/A N/A N/A N/A N/A
bionemo-webdatamodule PyTorch Lightning module to use WebDataset 🔧 Maintenance N/A N/A N/A N/A N/A N/A

BioNeMo Framework is part of a larger ecosystem of NVIDIA Biopharma products. Get notified of new releases, bug fixes, critical security updates, and more for biopharma. Subscribe.

Documentation Resources

  • Official Documentation: Contents of sub-packages including user guides, API references, and troubleshooting, are documented on our official documentation. Nightly builds of this documentation is available on BioNeMo Framework GitHub Pages

  • 🚧 In-Progress Documentation 🚧: bionemo-recipes documentation is currently work in progress, however the recipes are meant to be self-documented and easy to understand—we suggest you throw them into your favorite genai code assistant!

Getting Started with BioNeMo Framework

Full documentation on using the BioNeMo Framework is provided in our documentation: https://docs.nvidia.com/bionemo-framework/latest/user-guide/. To simplify the integration of optimized third-party dependencies, BioNeMo is primarily distributed as a containerized library. You can download the latest released container for the BioNeMo Framework from NGC. To launch a pre-built container, you can use the brev.dev launchable  Click here to deploy. or execute the following command:

docker run --rm -it \
  --gpus=all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
  nvcr.io/nvidia/clara/bionemo-framework:nightly \
  /bin/bash

Setting up a local development environment

Initializing 3rd-party dependencies as git submodules

The NeMo and Megatron-LM dependencies are included as git submodules in bionemo2. The pinned commits for these submodules represent the "last-known-good" versions of these packages that are confirmed to be working with bionemo2 (and those that are tested in CI).

To initialize these sub-modules when cloning the repo, add the --recursive flag to the git clone command:

git clone --recursive [email protected]:NVIDIA/bionemo-framework.git
cd bionemo-framework

To download the pinned versions of these submodules within an existing git repository, run

git submodule update --init --recursive

Different branches of the repo can have different pinned versions of these third-party submodules. Ensure submodules are automatically updated after switching branches or pulling updates by configuring git with:

git config submodule.recurse true

NOTE: this setting will not download new or remove old submodules with the branch's changes. You will have to run the full git submodule update --init --recursive command in these situations.

Build the Docker Image Locally

With a locally cloned repository and initialized submodules, build the BioNeMo container using:

docker buildx build . -t my-container-tag

If you see an error message like No file descriptors available (os error 24), add the option --ulimit nofile=65535:65535 to the docker build command.

VSCode Devcontainer for Interactive Debugging

We distribute a development container configuration for vscode (.devcontainer/devcontainer.json) that simplifies the process of local testing and development. Opening the bionemo-framework folder with VSCode should prompt you to re-open the folder inside the devcontainer environment.

Note

The first time you launch the devcontainer, it may take a long time to build the image. Building the image locally (using the command shown above) will ensure that most of the layers are present in the local docker cache.

Quick Start

See the tutorials pages for example applications and getting started guides.

About

BioNeMo Framework: For building and adapting AI models in drug discovery at scale

Topics

Resources

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 45