NVIDIA BioNeMo Framework is a comprehensive suite of programming tools, libraries, and models designed for computational drug discovery. It accelerates the most time-consuming and costly stages of building and adapting biomolecular AI models by providing domain-specific, optimized models and tooling that are easily integrated into GPU-based computational resources with state-of-the-art performance.
Note
A core use-case of the BioNeMo Framework is to help digital biology scientists accelerate and scale their model training onto a compute cluster. This repository contains 3 categories of modules for this use-case:
1. Models using fully-sharded-data-parallel (FSDP), which is possible with a number of different implementations including PyTorch’s FSDP2/FSDP1 and NVIDIA megatron-FSDP. Sharding a model with FSDP typically requires only a few lines of code changes. You can find models and ready-to-run recipes parallelized with megatron-FSDP and accelerated with NVIDIA TransformerEngine (TE) in bionemo-recipes
.
(Click to expand) bionemo-recipes
support matrix
Directory | Description | Support Status | 5D Parallel | Megatron-FSDP | TE | Sequence Packing | FP8 | Context Parallelism |
---|---|---|---|---|---|---|---|---|
models/ amplify |
TE accelerated protein BERT, pushed to HuggingFace | ✅ Active | ❌ | ✅ | ✅ | 🚧 WIP | ✅ | 🚧 WIP |
models/ esm2 |
TE accelerated protein BERT, pushed to HuggingFace | ✅ Active | ❌ | ✅ | ✅ | ✅ | ✅ | 🚧 WIP |
models/ geneformer |
TE accelerated single-cell BERT | 🚧 WIP | ❌ | ✅ | 🚧 WIP | 🚧 WIP | 🚧 WIP | 🚧 WIP |
recipes/ amplify_accelerate_te_fp8 |
Recipe for Amplify TE + HF Accelerate | ☠️ EOL[1] | ❌ | ✅ | ✅ | ❌ | ✅ | 🚧 WIP |
recipes/ esm2_accelerate_te |
Recipe for ESM2 TE + HF Accelerate | ✅ Active | ❌ | 🚧 WIP | ✅ | ❌ | ✅ | 🚧 WIP |
recipes/ esm2_native_te |
Recipe for ESM2 TE + native PyTorch | ✅ Active | ❌ | ✅ | ✅ | ✅ | ✅ | 🚧 WIP |
recipes/ esm2_native_te_mfsdp_thd |
Recipe for ESM2 TE + megatron-FSDP + seq packing | ☠️ EOL[1] | ❌ | ✅ | ✅ | ✅ | ✅ | 🚧 WIP |
recipes/ geneformer_native_te_mfsdp_fp8 |
Recipe for Geneformer HF model | 🚧 WIP | ❌ | ✅ | ✅ | ❌ | ✅ | 🚧 WIP |
recipes/ vit |
Recipe for Vision Transformer | 🚧 WIP | ❌ | ✅ | ✅ | ❌ | ✅ | 🚧 WIP |
[1]: End-of-life; to be merged with esm2_native_te
recipe.
2. Models using explicit 5D parallelism (tensor parallel, pipeline parallel, context parallel, etc.), for which NVIDIA provides accelerated support with NeMo and Megatron-Core. 5D parallelism requires explicit modification of the model code to make it shardable along different dimensions. The models for this style of acceleration and parallelism can be found in the sub-packages
directory. While it is possible to pip install the models, we strongly suggest using our Docker image that comes with NeMo and Megatron-Core pre-installed.
(Click to expand) sub-packages
models support matrix
Directory | Description | Support | 5D Parallel | Megatron-FSDP | TE | Sequence Packing | FP8 | Context Parallel |
---|---|---|---|---|---|---|---|---|
bionemo-amplify |
5D parallel model | 🔧 Maintenance | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ |
bionemo-core |
Model Config/test data utils | ✅ Active | ✅ | N/A | ✅ | ❌ | N/A | N/A |
bionemo-esm2 |
5D parallel model | ✅ Active | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ |
bionemo-evo2 |
5D parallel model | ✅ Active | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ |
bionemo-example_model |
Example 5D parallel model | 🔧 Maintenance | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ |
bionemo-fw |
Meta package to pull other packages | ✅ Active | ✅ | N/A | N/A | ❌ | ✅ | N/A |
bionemo-geneformer |
5D parallel model | 🔧 Maintenance | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ |
bionemo-llm |
5D parallel base model (BioBert) | ✅ Active | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
bionemo-testing |
Testing Utilities | ✅ Active | ✅ | N/A | N/A | N/A | N/A | N/A |
3. Tooling for dataloading and in-the-training-loop processing, which are lightweight and individually pip installable. These are also in the sub-packages
directory adjacent to the 5D parallel models.
(Click to expand) sub-packages
tooling support matrix
Directory | Description | Support | 5D Parallel | Megatron-FSDP | TE | Sequence Packing | FP8 | Context Parallel |
---|---|---|---|---|---|---|---|---|
bionemo-moco |
Molecular Co-design tools | ✅ Active | ❌ | N/A | N/A | N/A | N/A | N/A |
bionemo-noodles |
Python API to fast FASTA file I/O | 🔧 Maintenance | ❌ | N/A | N/A | N/A | N/A | N/A |
bionemo-scspeedtest |
Single Cell Dataloading benchmark tests | ✅ Active | N/A | N/A | N/A | N/A | N/A | N/A |
bionemo-size-aware-batching |
Memory consumption aware batching | 🔧 Maintenance | N/A | N/A | N/A | N/A | N/A | N/A |
bionemo-scdl |
Modular Single Cell Data Loader | ✅ Active | ✅ Compatible | N/A | N/A | N/A | N/A | N/A |
bionemo-webdatamodule |
PyTorch Lightning module to use WebDataset | 🔧 Maintenance | N/A | N/A | N/A | N/A | N/A | N/A |
BioNeMo Framework is part of a larger ecosystem of NVIDIA Biopharma products. Get notified of new releases, bug fixes, critical security updates, and more for biopharma. Subscribe.
-
Official Documentation: Contents of
sub-packages
including user guides, API references, and troubleshooting, are documented on our official documentation. Nightly builds of this documentation is available on BioNeMo Framework GitHub Pages -
🚧 In-Progress Documentation 🚧:
bionemo-recipes
documentation is currently work in progress, however the recipes are meant to be self-documented and easy to understand—we suggest you throw them into your favorite genai code assistant!
Full documentation on using the BioNeMo Framework is provided in our documentation:
https://docs.nvidia.com/bionemo-framework/latest/user-guide/. To simplify the integration of optimized third-party dependencies, BioNeMo is primarily distributed as a containerized library. You can download the latest released container for the BioNeMo Framework from
NGC. To launch a pre-built container, you can use the brev.dev launchable or execute the following command:
docker run --rm -it \
--gpus=all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
nvcr.io/nvidia/clara/bionemo-framework:nightly \
/bin/bash
The NeMo and Megatron-LM dependencies are included as git submodules in bionemo2. The pinned commits for these submodules represent the "last-known-good" versions of these packages that are confirmed to be working with bionemo2 (and those that are tested in CI).
To initialize these sub-modules when cloning the repo, add the --recursive
flag to the git clone command:
git clone --recursive [email protected]:NVIDIA/bionemo-framework.git
cd bionemo-framework
To download the pinned versions of these submodules within an existing git repository, run
git submodule update --init --recursive
Different branches of the repo can have different pinned versions of these third-party submodules. Ensure submodules are automatically updated after switching branches or pulling updates by configuring git with:
git config submodule.recurse true
NOTE: this setting will not download new or remove old submodules with the branch's changes.
You will have to run the full git submodule update --init --recursive
command in these situations.
With a locally cloned repository and initialized submodules, build the BioNeMo container using:
docker buildx build . -t my-container-tag
If you see an error message like No file descriptors available (os error 24)
, add the option --ulimit nofile=65535:65535
to the docker build command.
We distribute a development container configuration for vscode
(.devcontainer/devcontainer.json
) that simplifies the process of local testing and development. Opening the
bionemo-framework folder with VSCode should prompt you to re-open the folder inside the devcontainer environment.
Note
The first time you launch the devcontainer, it may take a long time to build the image. Building the image locally (using the command shown above) will ensure that most of the layers are present in the local docker cache.
See the tutorials pages for example applications and getting started guides.