fiPIP (Functionally Informed PIPs)

This repository aims to accomplish two tasks for users with statistical fine-mapping results: (1) provide a starting point for users seeking to generate or access deep-leaning based sequence-to-omics (S2O) scores from AlphaGenome, Borzoi, Enformer and/or Sei, and (2) generate functionally-informed posterior inclusion probabilities (fiPIPs) from quantitative scores containing functional information.

(1) and (2) can be completed independently of each other. Users can use any quantitative scores to generate fiPIPs using this code respository, including quantitative scores from tools not mentioned in this code repository or quantitative scores that are not from S2O models. In fact, as S2O AI models update and new ones are released, we encourage users to do so. This repository may not be updated if updates are released for the aforementioned S2O models or as new ones are released.

Of the four S2O models listed in this repository, we recommend using AlphaGenome or Borzoi scores to generate fiPIPs.

Installation

python -m pip install -U pip setuptools wheel
pip install -e .

Task 1: Generate or access deep-learning based sequence-to-omics scores

Please refer to each method's individual installation instructions before use. To prevent conflicts across the different methods which have different requirements, we recommend making virtual or Conda environments for each method.

AlphaGenome

Currently, an API has been released for AlphaGenome access. An API key is required. As this is a new method, we recommend following the most up-to-date tutorial; however, we do provide an example script for generating AlphaGenome RNA-seq scores in the tutorials folder, which is a condensed version of their tutorial here. Please make sure to set your API key before use.

# Example
pip install alphagenome
export ALPHAGENOME_API_KEY='YOUR_ALPHAGENOME_API_KEY'
fipip alphagenome --input tutorials/example_data.tsv --output alphagenome_results.csv --sep "\t"

The output file will have predictions for 667 RNA-seq tracks per variant. The "fallback" column is 1 for a variant if the variant's associated Ensembl ID was not present in the AlphaGenome output, and consequently a mean was taken over all other genes, and 0 otherwise.

Borzoi

Pre-computed Borzoi scores (recommended)

Please note that the pre-computed Borzoi scores are based on the hg19 genome build. If your variants are based on the hg38 genome build, please liftover first before continuing.

With the release of Srivastava, D. et al. (2025), pre-computed Borzoi scores have been released for over 19 million common and low frequency varaints. While offering less flexibility than generating your own Borzoi scores, using these scores can be very efficient and cost effective. Scores are available for both variant effect predictions (VEPs) and principal components (PCs) derived from VEPs.

Generating your own Borzoi scores

In order to generate your own Borzoi scores, please follow the installation instructions in the Borzoi repository to download the Borzoi models for use.

Two scripts in the tutorials folder can be used for generating Borzoi scores for the variants in your credible set. The first script we provide predicts for all 89 RNA-seq tracks. This first script produces two pickle objects per variant, one for each allele, corresponding to RNA-seq predictions at 32 base pair resolution for all 89 tissues across 4 folds.

# Example
fipip borzoi_1 --input tutorials/example_variants.tsv --outdir borzoi_objects

The second script takes the output folder of pickle objects and converts each pickle object to a singular Borzoi score for each variant for each track. If you would like to make predictions for only a subset of tracks, perhaps one(s) more relevant to the tissue of your eQTLs, the 89 columns of the pickle object correspond to the GTEx tissue replicates listed here. Please set the --tracks parameter to make predictions for only a subset of tissues. To make gene-contextual predictions for variants, please provide a GTF file and a file detailing the gene associated with each variant. Otherwise, gene-agnostic predictions will be made. This can be done by setting --no-gtf.

# Example
fipip borzoi_2 --input borzoi_objects --output borzoi_scores.csv --tracks 1-89 --gtf-path /path/to/your/gtf.gtf --gene-map tutorials/example_data.tsv

Enformer

Please note that the pre-computed Enformer scores are based on the hg19 genome build. If your variants are based on the hg38 genome build, please liftover first before continuing.

Pre-computed Enformer scores are available here. We provide a script for extracting Enformer scores from the h5 files as a script in the tutorials folder.

We currently do not provide a script for generating your own Enformer scores; however, instructions for doing so and example Google Colab notebooks are available in the Enformer github repository.

# Example
fipip enformer --output enformer_master.csv --h5-dir /path/to/downloaded/h5/files --variants-file tutorials/example_variants.tsv --targets-file tutorials/enformer_targets.txt

Sei

We recommend following the setup instructions and using the chromatin profile prediction and sequence class prediction scripts 1_variant_effect_prediction.sh and 2_varianteffect_sc_score.sh respectively in the Sei repository to obtain epigenomic readout and sequence class Sei scores. Both epigenomic readout and sequence class scores are quantitative scores that can be used for fiPIP generation.

Task 2: Generate functionally informed PIPs (fiPIPs)

We provide a command-line tool for generating fiPIPs from quantitative scores. Please refer to task 1 for direction on how quantitative scores can be obtained if necessary. The examples below mirror the example seen in Figure 3 of our preprint.

Please provide a file for testing and a file for training according to the following format.

Training file format

Required columns (include column names, use following column names for first three columns):

variant — Variant ID
label — Binary 0/1 for negative/positive label respectively
chr — Variant's chromosome
Continous scores (any number of columns; column names can be whatever, please make sure they match the names in the testing file)

Testing file format

Required columns (include column names, use following column names for first four columns):

variant — Variant ID
cs_id — Credible set ID
pip — Posterior inclusion probability (PIP) from statistical fine-mapping
Continous scores (Please make sure they match the columns in the training file)

The following command will generate PIP-agnostic probability-scale predictions, fiPIPs, and JSON files after training a XGBoost model for each chromosome to the output directory set by --outdir:

fipip calculate_fipip \
  --train-file tutorials/train_df.tsv \
  --predict-file tutorials/test_df.tsv \
  --outdir output \
  --groups 1-5   # optional parameter; 1-based indices; allows subset of continuous scores to be used

The following command will generate PIP-agnostic probability-scale predictions and fiPIPs from previously generated XGBoost model (JSON files) to the output directory set by --outdir:

fipip predict_from_json \
  --predict-file tutorials/test_df2.tsv \
  --models-dir output \
  --outdir new_output \

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
fipip		fipip
tutorials		tutorials
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

fiPIP (Functionally Informed PIPs)

Installation

Task 1: Generate or access deep-learning based sequence-to-omics scores

AlphaGenome

Borzoi

Pre-computed Borzoi scores (recommended)

Generating your own Borzoi scores

Enformer

Sei

Task 2: Generate functionally informed PIPs (fiPIPs)

Training file format

Testing file format

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

statgen/fipip

Folders and files

Latest commit

History

Repository files navigation

fiPIP (Functionally Informed PIPs)

Installation

Task 1: Generate or access deep-learning based sequence-to-omics scores

AlphaGenome

Borzoi

Pre-computed Borzoi scores (recommended)

Generating your own Borzoi scores

Enformer

Sei

Task 2: Generate functionally informed PIPs (fiPIPs)

Training file format

Testing file format

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages