Skip to content

brineylab/ghostfold

Repository files navigation

GhostFold

Accurate, database-free protein folding from single sequences using structure-aware synthetic MSAs


Overview

GhostFold is a next-generation protein folding framework that predicts 3D structures directly from single sequences — without relying on large evolutionary databases. By generating synthetic, structure-aware multiple sequence alignments (MSAs), GhostFold achieves high accuracy while remaining lightweight and portable.

This repository provides scripts and configurations to set up GhostFold locally using Mamba and ColabFold environments, along with integration support for Hugging Face Transformers.


Installation

1. Install Mamba (if not already installed)

GhostFold uses Mamba for virtual environment management.

You can install Mamba using one of the following methods:

Using Conda (recommended if Conda is already installed)

conda install -n base -c conda-forge mamba

Using Miniforge (standalone installation)

# For Linux or macOS (ARM/x86)
wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
bash Mambaforge-Linux-x86_64.sh

Follow the on-screen prompts to complete installation, then restart your terminal or run:

source ~/.bashrc

2. Clone the repository

git clone https://github.com/brineylab/ghostfold.git
cd ghostfold

3. Create and activate a new Mamba environment

mamba create -n ghostfold python=3.10
mamba activate ghostfold

If you receive an error indicating Mamba isn’t initialized, activate it manually:

source ~/.bashrc

4. Install dependencies

GhostFold relies on the following core libraries:

torch
transformers
sentencepiece

Install them using:

mamba install pytorch torchvision torchaudio -c pytorch
mamba install transformers sentencepiece -c conda-forge

Install the appropriate CUDA drivers for PyTorch.

Refer to the Transformers Installation Guide for platform-specific setup details.

We use Rich for printing log messages. Install it within the environment with:

pip install rich

Hugging Face Authentication

When using from_pretrained() to load models, GhostFold automatically fetches pretrained weights from the Hugging Face Hub. You may need to create and configure a Hugging Face access token. See the instructions here.

huggingface-cli login

Local ColabFold Setup

To enable local structure prediction, first ensure the setup scripts are executable:

chmod +x install_localcolabfold.sh ghostfold.sh

Then run the installation script:

./install_localcolabfold.sh

This script will:

  • Configure a compatible ColabFold environment
  • Install all required dependencies
  • Download model weights automatically

If you prefer to run predictions via Google Colab, you can use the generated pseudoMSAs directly in ColabFold by selecting “custom_msa” under MSA settings.


Running GhostFold

Once setup is complete, ensure all scripts are executable:

chmod +x ghostfold.sh

Then, launch a prediction using:

./ghostfold.sh --project_name <your_project_name> --fasta_file <path/to/your/fasta_file.fasta>

Example run

GhostFold includes a sample FASTA file for demonstration:

./ghostfold.sh --project_name 7JJV --fasta_file query.fasta

Modes of Operation

GhostFold can operate in two primary modes:

  1. MSA Generation Mode – Generates structure-aware synthetic MSAs only.
  2. Full Mode – Runs both synthetic MSA generation and structure prediction.

You may run folding separately after generating MSAs. To explore the available options:

./ghostfold.sh --help

References

About

Accurate, database-free protein folding from single sequence using structure-aware synthetic MSAs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published