NVIDIA BioNeMo

NVIDIA BioNeMo is an open developer platform for AI-driven life science research.

It provides GPU-accelerated models, tools, and datasets for the entire AI lifecycle, enabling researchers and developers to build, customize, and deploy AI applications that transform physical lab results into the digital insights that drive the next experiment.

The platform is built on five core pillars:

Data: Large-scale datasets for training, fine-tuning, and benchmarking models.
Models: Open-source models for understanding biological systems, designing novel proteins and small molecules, and optimizing candidates for synthesizability, binding affinity, and molecular properties.
Libraries and Tools: Foundational GPU-optimized libraries and kernels for accelerated AI training and inference.
Training and Customization: Frameworks and recipes for pretraining, fine-tuning, and adapting models for specialized use cases.
Optimized Inference and Deployment: Enterprise-ready NVIDIA inference microservices (NIM) and reference architectures for production use.

Note: Many components of the BioNeMo platform are modular and hosted in their own dedicated GitHub repositories or organizations. This README serves as a central index to guide you to the right tools.

License

BioNeMo components are generally released under:

Data: CC BY 4.0 license
Model weights: NVIDIA Open Model License Agreement
Code: Apache 2.0 license

Individual components may vary — check each resource for specific license terms.

Data

Unlike natural language models trained on internet-scale data, biology and chemistry lack the critical mass of data required for large, general-purpose foundation models. To address this ecosystem-wide gap, NVIDIA is partnering with leading organizations to create and release open datasets.

Dataset	Description
3D Structures of Protein Complexes (available through the AlphaFold Database)	Large-scale open database of predicted protein complex structures built with ecosystem partners to accelerate interaction biology and drug discovery. License: CC BY 4.0
Consistency Distilled Synthetic Protein Database	455K curated, high-quality protein sequence-structure pairs. Built using ProteinMPNN to generate synthetic sequences for Foldseek AFDB cluster representative structures, then refolded with ESMFold to obtain fully atomistic, self-consistent models. Filtered to pLDDT > 80. License: CC BY 4.0

Models

NVIDIA BioNeMo provides high-quality, fully open-source models — including the full training codebase, pre-trained weights, and research papers — completely free to use. These models are hosted in the NVIDIA-Digital-Bio GitHub organization.

These models reflect our active research directions, and we highly encourage community feedback, collaboration, and adaptation to push their capabilities further.

Understand

Use Case	Model	Description
Target Identification / Disease Understanding (RNA)	CodonFM	Codon-level RNA foundation model trained on 130M protein-coding sequences from 22K+ species. Captures synonymous codon variation for mRNA design, stability modeling, and variant interpretation.
Structure Prediction (RNA)	RNAPro	State-of-the-art RNA 3D structure prediction model. Combines Protenix-based co-folding architectures with RNA foundation models, MSA, and template-based modeling.

Design

Use Case	Model	Description
Proteins	Proteina-Complexa	Protein binder design for protein and small molecule targets. Combines a pretrained flow-based generative model (built on La-Proteina) with inference-time optimization for high-quality binder generation.
	La-Proteina	All-atom protein generation using partially latent flow matching. Jointly generates amino acid sequence and full atomistic structure (backbone + side chains) for up to 800 residues. Enables atomistic motif scaffolding for enzyme design.
	Proteina	Large-scale flow-based generative model for protein backbone structures with hierarchical fold-class conditioning and a scalable transformer architecture.
	ProtComposer	Spatial-layout-conditioned protein structure generation using 3D ellipsoids to control shape and substructure arrangements.
Small Molecules	GenMol	Fragment-based molecule generation using masked discrete diffusion over SAFE representations. Supports de novo design, scaffold decoration, linker design, motif extension, and lead optimization.
	Megalodon	Transformer-based 3D molecule generative model using equivariant graph transformer architecture. Generates both 2D topology and 3D structure with physically realistic, low-energy conformations.
	AvgFlow	Efficient molecular 3D conformer generation using SO(3)-averaged flow-matching and reflow. Architecture-agnostic framework applicable to equivariant and non-equivariant models.

Optimize

Use Case	Model	Description
Property Prediction	KERMT	Pretrained graph neural network for molecular property prediction (ADMET). Multi-task extension of GROVER with accelerated data loading via cuik-molmaker. SOTA on real-world ADMET data.
Synthesizability	ReaSyn	Synthesis pathway prediction using an encoder-decoder Transformer with Chain-of-Reaction notation. Predicts reaction steps from building blocks to final products, or finds synthesizable analogs for unsynthesizable targets.
Binding Energy	DualBind	3D structure-based deep learning model for protein-ligand binding affinity prediction using a dual-loss framework (supervised MSE + unsupervised denoising). Orders of magnitude faster than physics-based FEP methods.

Libraries and Tools

GPU-optimized libraries and tools that integrate into existing workflows. Engineered to be lightweight and specialized for maximum performance without dependency bloat.

Task	Tool	Description
Data Processing & Analysis	Parabricks	GPU-accelerated genomics software suite for rapid secondary analysis of DNA/RNA sequencing data.
	nvMolKit	GPU-accelerated cheminformatics library for molecular fingerprinting, Tanimoto/cosine similarity, Butina clustering, conformer generation (ETKDGv3), MMFF geometry optimization, and substructure search.
	cuik-molmaker	Molecular featurization package for converting chemical structures into GNN inputs. Accelerates Chemprop training by 1.6x and inference by 2.4x with 80% memory reduction.
	nvQSP	GPU-accelerated Quantitative Systems Pharmacology ODE solvers. 77x speedup over CPU for virtual patient simulations with bit-exact FP64 reproducibility.
Training & Inference	cuEquivariance	CUDA-X library with optimized kernels for efficient training of geometry-aware equivariant neural networks (AlphaFold-like and molecular structure models).
	BioNeMo-SCDL	Scalable, memory-efficient data loader for training large single-cell models. Part of BioNeMo Framework.
	BioNeMo-MoCo	Framework for constructing generative models (diffusion, flow-matching) using continuous and discrete interpolants. Part of BioNeMo Framework.
	BioNeMo-Noodles	Efficient genomic data handling with memory-mapped access to FASTA files. Part of BioNeMo Framework.

Training and Customization

BioNeMo provides frameworks and recipes for pretraining, fine-tuning, and adapting biomolecular AI models at scale on GPU infrastructure.

Tool	Description
BioNeMo Framework	Reference training implementations and ready-to-run examples showing how to achieve lower-precision training, maximum scaling & throughput for models like Llama3, ESM2, Evo2, CodonFM, and Geneformer using FSDP and TransformerEngine.
Context Parallelism (boltz-cp)	Long-sequence parallelism for protein structure prediction models. Distributes activation tensors across GPUs to overcome single-GPU memory limits for large biomolecules.

Documentation: docs.nvidia.com/bionemo-framework

Optimized Inference and Deployment

BioNeMo NIM microservices are enterprise-ready inference microservices with built-in API endpoints. Each NIM includes algorithmic, system, and runtime optimizations into a prebuilt container — go from zero to inference in minutes.

NIM	Description
OpenFold3	3D structure prediction for molecular complexes (proteins, DNA, RNA, ligands)
OpenFold2	Protein structure prediction from sequence, MSAs, and templates
Boltz-2	Biomolecular complex structure prediction
Evo2-40B	Genomic foundation model with long-context sequence understanding
MSA Search	Multiple sequence alignment generation from query sequences
ProteinMPNN	Amino acid sequence design for protein backbones
RFDiffusion	Generative model for protein backbone and binder design
GenMol	Fragment-based small molecule generation
DiffDock	Molecular blind docking for predicting protein-ligand binding poses
MolMIM	Molecular generation optimized for user-defined drug properties

Browse all available NIM microservices: build.nvidia.com/explore/biology

NIM microservices can be deployed self-hosted via Docker or Kubernetes, or on cloud platforms including AWS, Google Cloud, Microsoft Azure, and NVIDIA DGX Cloud.

Workflow Examples and Community Contributions

Application-level examples showing how BioNeMo platform components work together:

digital-biology-examples — End-to-end workflow examples for drug discovery and biological research.

Note: If you have an example you'd like to contribute, we'd love to include it. Please get started by opening a GitHub issue and we'll reach out to you.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA BioNeMo

Table of Contents

License

Data

Models

Understand

Design

Optimize

Libraries and Tools

Training and Customization

Optimized Inference and Deployment

Workflow Examples and Community Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

NVIDIA BioNeMo

Table of Contents

License

Data

Models

Understand

Design

Optimize

Libraries and Tools

Training and Customization

Optimized Inference and Deployment

Workflow Examples and Community Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages