DeepPPI

The paper : DeepPPI: Boosting Prediction of Protein–Protein Interactions with Deep Neural Networks

A deep learning-based framework for predicting protein-protein interactions (PPIs) using handcrafted biological features and fully connected neural networks.

Project Overview

Protein-protein interactions are critical for understanding biological systems and disease mechanisms. DeepPPI builds a binary classifier for PPIs using a feature-rich representation of protein sequences. The model combines:

Amino acid composition (AAC)
Dipeptide composition (DPC)
Physicochemical descriptors (composition, transition, distribution)
Sequence-order features (QSO and SOC)
Amphiphilic pseudo amino acid composition (APAAC)

Project Structure

Features Used for Sequence Encoding (Total = 1164 features)

Feature Type	Dimensions
Amino Acid Composition (AAC)	20
Dipeptide Composition (DPC)	400
CTD Descriptors (24 x 21 features)	504
Quasi Sequence Order (QSO)	100
Sequence Order Coupling (SOC)	60
Amphiphilic Pseudo AAC (APAAC)	80
TOTAL	1164

Model Architecture

Each protein passes through its own 3-layer FC encoder:

Linear → BatchNorm → LeakyReLU → Dropout (×3)
Encoded vectors are concatenated
Output passed through another FC + Sigmoid for binary classification

Layers:

FC: Linear layers with decreasing size (1164 → 512 → 256 → 128)
Dropout: 0.2 dropout rate
Activation: LeakyReLU
Output: Sigmoid on 2-class output

How to Use

Clone the repo: git clone https://github.com/kianaseraj/DeepPPI.git cd DeepPPI
Install required packages: pip install numpy torch scikit-learn matplotlib tqdm protpy
Prepare the data:
- Input .npy files for PPI pairs: [[prot1, prot2, label], ...]
- Extract sequence features using feature_generator.py
Run training: python main.py

Evaluation Metrics

All metrics are defined in metrics.py:

Accuracy
Precision
Recall (Sensitivity)
F1 Score

Note on Dataset Privacy and Project Context

The DeepPPI model in this repository was implemented as part of a broader benchmarking project. The aim was to demonstrate how information leakage can occur in protein-protein interaction datasets, due to high sequence similarity between proteins across training and test sets. Our work highlights the importance of properly splitting datasets to avoid artificially inflated performance.

The code provided here benchmarks DeepPPI against other models using our custom dataset. However, due to privacy constraints, the dataset is not publicly available in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepPPI

Project Overview

Project Structure

Features Used for Sequence Encoding (Total = 1164 features)

Model Architecture

How to Use

Evaluation Metrics

Note on Dataset Privacy and Project Context

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeepPPI

Project Overview

Project Structure

Features Used for Sequence Encoding (Total = 1164 features)

Model Architecture

How to Use

Evaluation Metrics

Note on Dataset Privacy and Project Context

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages