PROTRIDER

PROTRIDER is an autoencoder-based method to call protein outliers from mass spectrometry-based proteomics datasets.

Have a look at our paper for information about our work.

Table of Contents

🚀 Quickstart

# 1. Install
pip install protrider

# 2. Run on the included sample data
protrider run --config config.yaml

# 3. Plot results
protrider plot --config config.yaml all

Results are written to the directory specified by out_dir in config.yaml (default: output/). The key output file is protrider_summary.csv, which contains outlier calls with p-values, z-scores, and fold changes for every sample–protein pair.

⚙️ Installation

PROTRIDER was tested using Python 3.14 on Linux. We recommend a dedicated conda environment:

conda create --name protrider_env python=3.14
conda activate protrider_env
pip install protrider

Verify the installation:

protrider --help

More information on conda environments can be found in Conda's user guide.

📖 Usage

🗂️ Configuration

All parameters are set in a YAML configuration file. A template is provided as config.yaml.

Input files are specified in the config:

input_intensities: CSV, TSV, or Parquet file with protein intensities (columns = samples, rows = proteins)
sample_annotation (optional): CSV or TSV file with sample covariates (one row per sample)

An example dataset is included under sample_data/.

Configuration parameters

Parameter	Description
`out_dir`	Output directory
`input_intensities`	Path to protein intensities file
`max_allowed_NAs_per_protein`	Maximum percentage of missing values per protein (default: `0.3`)
`log_func_name`	Transformation funtion to apply to the data before model fitting: `log` (default), `log10`, `log2`, or `null` (if already log transformed)
`sample_annotation`	Path to sample annotations file (optional)
`index_col`	Column name containing protein IDs
`cov_used`	List of covariate column names from the annotation file (optional)
`find_q_method`	Method to determine latent dimension: `OHT` (default), `gs`, `bs` (binary search), or an integer
`pval_dist`	Distribution for p-value calculation: `t` (default) or `gaussian`
`n_epochs`	Number of training epochs (default: `100`)
`checkpoint_path`	Path to save/load model checkpoint (optional)

📤 Output

The key output file is protrider_summary.csv, which contains outlier calls with p-values, z-scores, and fold changes for every sample–protein pair.

Output files

File	Description
`protrider_summary.csv`	Long-format summary with outlier calls for all sample–protein pairs
`pvals.csv`	Two-sided p-values (samples × proteins)
`pvals_adj.csv`	BH/BY-adjusted p-values
`pvals_one_sided.csv`	Left-sided p-values
`zscores.csv`	Z-scores
`residuals.csv`	Model residuals (observed − predicted)
`log2fc.csv`	Log2 fold changes
`fc.csv`	Fold changes
`output.csv`	Autoencoder reconstructed values
`processed_input.csv`	Preprocessed input passed to the autoencoder
`additional_info.csv`	Model metadata (latent dimension, learning rate, loss)
`train_losses.csv`	Per-epoch training loss
`fit_parameters.csv`	Per-protein distribution fit parameters
`config.yaml`	Saved configuration for reproducibility

▶️ Run

Run the pipeline:

protrider run --config config.yaml

Generate plots:

# All plots
protrider plot --config config.yaml all

# Individual plot types
protrider plot --config config.yaml pvals
protrider plot --config config.yaml aberrant_per_sample
protrider plot --config config.yaml training_loss
protrider plot --config config.yaml encoding_dim

# Expected vs observed for a specific protein
protrider plot --config config.yaml expected_vs_observed --protein_id <protein_id>

Model checkpointing

PROTRIDER automatically saves trained models and reuses them in subsequent runs, skipping retraining if a checkpoint exists. By default the model is saved to <out_dir>/model.pt.

To use a custom checkpoint location, set checkpoint_path in your config:

checkpoint_path: models/my_model.pt

To force retraining, delete the checkpoint file or point to a new path.

Python API

import protrider

config = protrider.ProtriderConfig(
    out_dir='output/',
    input_intensities='data/protein_intensities.csv',
    sample_annotation='data/sample_annotations.csv',
    index_col='protein_ID',
    cov_used=['AGE', 'SEX'],
    n_epochs=100,
)

# Run
result, model_info, fit_params, gs_result = protrider.run(config)

# Save results
result.save(config.out_dir, format='wide')   # individual CSV files
result.save(config.out_dir, format='long')   # protrider_summary.csv
model_info.save(config.out_dir)
config.save(config.out_dir)

# Generate plots (omit out_dir to get plot objects without saving)
model_info.plot_training_loss(config.out_dir)
result.plot_aberrant_per_sample(config.out_dir)
hist_plot, qq_plot = result.plot_pvals(config.out_dir)
result.plot_expected_vs_observed('protein_123', config.out_dir)

# Access results as DataFrames
result.df_pvals        # p-values
result.df_pvals_adj    # adjusted p-values
result.df_Z            # z-scores
result.df_res          # residuals
result.log2fc          # log2 fold changes
result.fc              # fold changes

🛠️ Developers

This project uses uv for dependency management and development:

uv sync --all-groups        # install runtime + dev dependencies
uv run pytest tests/ -q     # run the test suite

Releasing a new version

PROTRIDER is published to PyPI automatically by the publish.yml GitHub Actions workflow, which is triggered by pushing a Git tag matching v*. The package version is derived from the tag by setuptools-scm, so the tag is the single source of truth — there is no version number to bump in pyproject.toml.

To cut a release:

# 1. Make sure main is up to date and tests pass
git checkout main
git pull

# 2. Create an annotated tag following semantic versioning (vMAJOR.MINOR.PATCH)
git tag -a v1.2.3 -m "Release v1.2.3"

# 3. Push the tag to trigger the publish workflow
git push origin v1.2.3

Pushing the tag builds the package with uv build and publishes it to PyPI via trusted publishing (no API token required). Track progress on the Actions page.

📄 License

This project is licensed under the MIT License.

📚 Citation

If you use PROTRIDER, please cite:

@article{10.1093/bioinformatics/btaf628,
    author = {Klaproth-Andrade, Daniela and Scheller, Ines F and Tsitsiridis, Georgios and Loipfinger, Stefan and Mertes, Christian and Smirnov, Dmitrii and Prokisch, Holger and Yépez, Vicente A and Gagneur, Julien},
    title = {PROTRIDER: Protein abundance outlier detection from mass spectrometry-based proteomics data with a conditional autoencoder},
    journal = {Bioinformatics},
    pages = {btaf628},
    year = {2025},
    month = {11},
    issn = {1367-4811},
    doi = {10.1093/bioinformatics/btaf628},
    url = {https://doi.org/10.1093/bioinformatics/btaf628},
}

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
.github/workflows		.github/workflows
sample_data		sample_data
src/protrider		src/protrider
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PROTRIDER

🚀 Quickstart

⚙️ Installation

📖 Usage

🗂️ Configuration

📤 Output

▶️ Run

🛠️ Developers

Releasing a new version

📄 License

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PROTRIDER

🚀 Quickstart

⚙️ Installation

📖 Usage

🗂️ Configuration

📤 Output

▶️ Run

🛠️ Developers

Releasing a new version

📄 License

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages