PROTRIDER is an autoencoder-based method to call protein outliers from mass spectrometry-based proteomics datasets.
Have a look at our paper for information about our work.
Table of Contents
# 1. Install
pip install protrider
# 2. Run on the included sample data
protrider run --config config.yaml
# 3. Plot results
protrider plot --config config.yaml allResults are written to the directory specified by out_dir in config.yaml (default: output/). The key output file is protrider_summary.csv, which contains outlier calls with p-values, z-scores, and fold changes for every sample–protein pair.
PROTRIDER was tested using Python 3.14 on Linux. We recommend a dedicated conda environment:
conda create --name protrider_env python=3.14
conda activate protrider_env
pip install protriderVerify the installation:
protrider --helpMore information on conda environments can be found in Conda's user guide.
All parameters are set in a YAML configuration file. A template is provided as config.yaml.
Input files are specified in the config:
input_intensities: CSV, TSV, or Parquet file with protein intensities (columns = samples, rows = proteins)sample_annotation(optional): CSV or TSV file with sample covariates (one row per sample)
An example dataset is included under sample_data/.
Configuration parameters
| Parameter | Description |
|---|---|
out_dir |
Output directory |
input_intensities |
Path to protein intensities file |
max_allowed_NAs_per_protein |
Maximum percentage of missing values per protein (default: 0.3) |
log_func_name |
Transformation funtion to apply to the data before model fitting: log (default), log10, log2, or null (if already log transformed) |
sample_annotation |
Path to sample annotations file (optional) |
index_col |
Column name containing protein IDs |
cov_used |
List of covariate column names from the annotation file (optional) |
find_q_method |
Method to determine latent dimension: OHT (default), gs, bs (binary search), or an integer |
pval_dist |
Distribution for p-value calculation: t (default) or gaussian |
n_epochs |
Number of training epochs (default: 100) |
checkpoint_path |
Path to save/load model checkpoint (optional) |
The key output file is protrider_summary.csv, which contains outlier calls with p-values, z-scores, and fold changes for every sample–protein pair.
Output files
| File | Description |
|---|---|
protrider_summary.csv |
Long-format summary with outlier calls for all sample–protein pairs |
pvals.csv |
Two-sided p-values (samples × proteins) |
pvals_adj.csv |
BH/BY-adjusted p-values |
pvals_one_sided.csv |
Left-sided p-values |
zscores.csv |
Z-scores |
residuals.csv |
Model residuals (observed − predicted) |
log2fc.csv |
Log2 fold changes |
fc.csv |
Fold changes |
output.csv |
Autoencoder reconstructed values |
processed_input.csv |
Preprocessed input passed to the autoencoder |
additional_info.csv |
Model metadata (latent dimension, learning rate, loss) |
train_losses.csv |
Per-epoch training loss |
fit_parameters.csv |
Per-protein distribution fit parameters |
config.yaml |
Saved configuration for reproducibility |
Run the pipeline:
protrider run --config config.yamlGenerate plots:
# All plots
protrider plot --config config.yaml all
# Individual plot types
protrider plot --config config.yaml pvals
protrider plot --config config.yaml aberrant_per_sample
protrider plot --config config.yaml training_loss
protrider plot --config config.yaml encoding_dim
# Expected vs observed for a specific protein
protrider plot --config config.yaml expected_vs_observed --protein_id <protein_id>Model checkpointing
PROTRIDER automatically saves trained models and reuses them in subsequent runs, skipping retraining if a checkpoint exists. By default the model is saved to <out_dir>/model.pt.
To use a custom checkpoint location, set checkpoint_path in your config:
checkpoint_path: models/my_model.ptTo force retraining, delete the checkpoint file or point to a new path.
Python API
import protrider
config = protrider.ProtriderConfig(
out_dir='output/',
input_intensities='data/protein_intensities.csv',
sample_annotation='data/sample_annotations.csv',
index_col='protein_ID',
cov_used=['AGE', 'SEX'],
n_epochs=100,
)
# Run
result, model_info, fit_params, gs_result = protrider.run(config)
# Save results
result.save(config.out_dir, format='wide') # individual CSV files
result.save(config.out_dir, format='long') # protrider_summary.csv
model_info.save(config.out_dir)
config.save(config.out_dir)
# Generate plots (omit out_dir to get plot objects without saving)
model_info.plot_training_loss(config.out_dir)
result.plot_aberrant_per_sample(config.out_dir)
hist_plot, qq_plot = result.plot_pvals(config.out_dir)
result.plot_expected_vs_observed('protein_123', config.out_dir)
# Access results as DataFrames
result.df_pvals # p-values
result.df_pvals_adj # adjusted p-values
result.df_Z # z-scores
result.df_res # residuals
result.log2fc # log2 fold changes
result.fc # fold changesThis project uses uv for dependency management and development:
uv sync --all-groups # install runtime + dev dependencies
uv run pytest tests/ -q # run the test suitePROTRIDER is published to PyPI automatically by the
publish.yml GitHub Actions workflow, which is triggered by
pushing a Git tag matching v*. The package version is derived from the tag by
setuptools-scm, so the tag is the single source of
truth — there is no version number to bump in pyproject.toml.
To cut a release:
# 1. Make sure main is up to date and tests pass
git checkout main
git pull
# 2. Create an annotated tag following semantic versioning (vMAJOR.MINOR.PATCH)
git tag -a v1.2.3 -m "Release v1.2.3"
# 3. Push the tag to trigger the publish workflow
git push origin v1.2.3Pushing the tag builds the package with uv build and publishes it to PyPI via trusted
publishing (no API token required). Track progress on the
Actions page.
This project is licensed under the MIT License.
If you use PROTRIDER, please cite:
@article{10.1093/bioinformatics/btaf628,
author = {Klaproth-Andrade, Daniela and Scheller, Ines F and Tsitsiridis, Georgios and Loipfinger, Stefan and Mertes, Christian and Smirnov, Dmitrii and Prokisch, Holger and Yépez, Vicente A and Gagneur, Julien},
title = {PROTRIDER: Protein abundance outlier detection from mass spectrometry-based proteomics data with a conditional autoencoder},
journal = {Bioinformatics},
pages = {btaf628},
year = {2025},
month = {11},
issn = {1367-4811},
doi = {10.1093/bioinformatics/btaf628},
url = {https://doi.org/10.1093/bioinformatics/btaf628},
}