Accurate, database-free protein folding from single sequences using structure-aware synthetic MSAs
GhostFold is a next-generation protein folding framework that predicts 3D structures directly from single sequences — without relying on large evolutionary databases. By generating synthetic, structure-aware multiple sequence alignments (MSAs), GhostFold achieves high accuracy while remaining lightweight and portable.
This repository provides scripts and configurations to set up GhostFold locally using Mamba and ColabFold environments, along with integration support for Hugging Face Transformers.
GhostFold uses Mamba for virtual environment management.
You can install Mamba using one of the following methods:
conda install -n base -c conda-forge mamba# For Linux or macOS (ARM/x86)
wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
bash Mambaforge-Linux-x86_64.shFollow the on-screen prompts to complete installation, then restart your terminal or run:
source ~/.bashrcgit clone https://github.com/brineylab/ghostfold.git
cd ghostfoldmamba create -n ghostfold python=3.10
mamba activate ghostfoldIf you receive an error indicating Mamba isn’t initialized, activate it manually:
source ~/.bashrcGhostFold relies on the following core libraries:
torch
transformers
sentencepiece
Install them using:
mamba install pytorch torchvision torchaudio -c pytorch
mamba install transformers sentencepiece -c conda-forgeInstall the appropriate CUDA drivers for PyTorch.
Refer to the Transformers Installation Guide for platform-specific setup details.
We use Rich for printing log messages. Install it within the environment with:
pip install rich
When using from_pretrained() to load models, GhostFold automatically fetches pretrained weights from the Hugging Face Hub.
You may need to create and configure a Hugging Face access token. See the instructions here.
huggingface-cli loginTo enable local structure prediction, first ensure the setup scripts are executable:
chmod +x install_localcolabfold.sh ghostfold.shThen run the installation script:
./install_localcolabfold.shThis script will:
- Configure a compatible ColabFold environment
- Install all required dependencies
- Download model weights automatically
If you prefer to run predictions via Google Colab, you can use the generated pseudoMSAs directly in ColabFold by selecting “custom_msa” under MSA settings.
Once setup is complete, ensure all scripts are executable:
chmod +x ghostfold.shThen, launch a prediction using:
./ghostfold.sh --project_name <your_project_name> --fasta_file <path/to/your/fasta_file.fasta>GhostFold includes a sample FASTA file for demonstration:
./ghostfold.sh --project_name 7JJV --fasta_file query.fastaGhostFold can operate in two primary modes:
- MSA Generation Mode – Generates structure-aware synthetic MSAs only.
- Full Mode – Runs both synthetic MSA generation and structure prediction.
You may run folding separately after generating MSAs. To explore the available options:
./ghostfold.sh --help