GitHub - BruthYU/protein-se3: Benchmarking se3-based generative models for protein structure design (⚡pytorch lightning backend)

Benchmarking SE(3)-based Generative Models for Protein Structure Design

Multi-GPU Training supported by Pytorch Lightning⚡

Framework Overview

Supported Methods

Name	Paper	Venue	Date	Code
FrameDiff	SE(3) diffusion model with application to protein backbone generation	ICML	2023-04-25	Github
FoldFlow	SE(3)-Stochastic Flow Matching for Protein Backbone Generation	ICLR	2024-04-21	Github
Genie1	Genie: De Novo Protein Design by Equivariantly Diffusing Oriented Residue Clouds	ICML	2023-06-26	Github
Genie2	Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2	arxiv	2024-05-24	Github
FrameFlow	Improved motif-scaffolding with SE(3) flow matching	TMLR	2024-07-17	Github
RFdiffusion	De novo design of protein structure and function with RFdiffusion	Nature	2023-07-11	Github

Installation

To get started, simply create conda environment and run pip installation:

conda create -n protein-se3 python=3.9
git clone https://github.com/BruthYU/protein-se3
...
cd protein-se3
pip install -r requirements.txt

Specially, you also need to install NVIDIA's implementation of SE(3)-Transformers to use RFdiffusion. Run script below to install the NVIDIA SE(3)-Transformer:

cd protein-se3/lightning/model/rfdiffusion/SE3Transformer
python setup.py install

Usage

In this section we will demonstrate how to use Protein-SE(3).

How to Preprocess Dataset and Build Cache

Details

All preprocess operations (i.e. how pdb files map to the lmdb cache) are implemented in the folder protein-se3/preprocess. Please refer to this README.md for more instructions.

Protein-SE(3) featurizes proteins with the Alphafold Protein Data Type, and build lmdb cache following the FoldFlow method. Different protein files (mmcif, pdb and jsonl) are unifed into one data type, thus the built cache could be loaded for all integrated methods during training.

python preprocess/process_pdb_dataset.py
# Intermediate pickle files are generated.
python preprocess/build_cache.py
# Filtering configurations are listed in config.yaml, the lmdb cache will/should be placed in preprocess/.cache.

You can also directly download our preprocessed dataset at Harvard Dataverse

How to Run Training and Inference

Details

Training and inference of all integrated methods are implemented in the lightning workspace (protein-se3\lightning). You can refer to this README.md for more details.

How to Evaluate Different Methods

Details

We evaluate different protein structure design methods on two tasks: Unconditional Scaffolding and Motif Scaffolding. Please refer to README.md for more detailed information.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
document		document
evaluate		evaluate
lightning_protein		lightning_protein
mathematics		mathematics
preprocess		preprocess
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Benchmarking SE(3)-based Generative Models for Protein Structure Design

Multi-GPU Training supported by Pytorch Lightning⚡

Framework Overview

Supported Methods

Installation

Usage

How to Preprocess Dataset and Build Cache

How to Run Training and Inference

How to Evaluate Different Methods

Benchmark Results

Unconditional Scaffolding across Varying Lengths

Motif Scaffolding on Design24

Secondary Structure Analysis

About

Uh oh!

Releases

Packages

Languages

License

BruthYU/protein-se3

Folders and files

Latest commit

History

Repository files navigation

Benchmarking SE(3)-based Generative Models for Protein Structure Design

Multi-GPU Training supported by Pytorch Lightning⚡

Framework Overview

Supported Methods

Installation

Usage

How to Preprocess Dataset and Build Cache

How to Run Training and Inference

How to Evaluate Different Methods

Benchmark Results

Unconditional Scaffolding across Varying Lengths

Motif Scaffolding on Design24

Secondary Structure Analysis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages