| Name | Paper | Venue | Date | Code |
|---|---|---|---|---|
| FrameDiff | SE(3) diffusion model with application to protein backbone generation |
ICML | 2023-04-25 | Github |
| FoldFlow | SE(3)-Stochastic Flow Matching for Protein Backbone Generation |
ICLR | 2024-04-21 | Github |
| Genie1 | Genie: De Novo Protein Design by Equivariantly Diffusing Oriented Residue Clouds |
ICML | 2023-06-26 | Github |
| Genie2 | Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2 |
arxiv | 2024-05-24 | Github |
| FrameFlow | Improved motif-scaffolding with SE(3) flow matching |
TMLR | 2024-07-17 | Github |
| RFdiffusion | De novo design of protein structure and function with RFdiffusion |
Nature | 2023-07-11 | Github |
To get started, simply create conda environment and run pip installation:
conda create -n protein-se3 python=3.9
git clone https://github.com/BruthYU/protein-se3
...
cd protein-se3
pip install -r requirements.txtSpecially, you also need to install NVIDIA's implementation of SE(3)-Transformers to use RFdiffusion. Run script below to install the NVIDIA SE(3)-Transformer:
cd protein-se3/lightning/model/rfdiffusion/SE3Transformer
python setup.py installIn this section we will demonstrate how to use Protein-SE(3).
Details
All preprocess operations (i.e. how pdb files map to the lmdb cache) are implemented in the folder protein-se3/preprocess. Please refer to this README.md for more instructions.
Protein-SE(3) featurizes proteins with the Alphafold Protein Data Type, and build lmdb cache following the FoldFlow method.
Different protein files (mmcif, pdb and jsonl) are unifed into one data type, thus the built cache could be loaded for all integrated methods during training.
python preprocess/process_pdb_dataset.py
# Intermediate pickle files are generated.
python preprocess/build_cache.py
# Filtering configurations are listed in config.yaml, the lmdb cache will/should be placed in preprocess/.cache. You can also directly download our preprocessed dataset at Harvard Dataverse
Details
Training and inference of all integrated methods are implemented in the lightning workspace (protein-se3\lightning). You can refer to this README.md for more details.
Details
We evaluate different protein structure design methods on two tasks: Unconditional Scaffolding and Motif Scaffolding. Please refer to README.md for more detailed information.