Skip to content

aoyangchen/FRAPPUCCINO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FRAPPUCCINO

Machine-learning benchmark and workflow for family 1 glycosyltransferase (GT1)–acceptor reactivity prediction under novelty-controlled evaluation.

This repository accompanies the MSc thesis Predicting Glycosyltransferase Acceptor Specificity with Variational Autoencoders and Pretrained Protein–Small-Molecule Representations. It contains a notebook-first pipeline for dataset harmonization, feature generation, novelty-controlled benchmarking, and model evaluation across pooled-feature baseline models, token-level cross-attention, and VAE-based fusion models.

In the thesis benchmark, pooled-feature XGBoost performed best in most settings, while the early-fusion supervised VAE performed strongest under the strictest double-cold enzyme-and-substrate novelty regime.

Why the name?

FRAPPUCCINO stands for:

Family 1 glycosyltransferase (GT1)
Reactivity and
Acceptor-Pair
Prediction with
Pretrained protein–small-molecule representations,
Using
Cross-modal
Compression and
Inference under
Novelty-controlled
Out-of-distribution evaluation.

Repository structure

  • notebooks/ — end-to-end Colab workflow
  • data/ — input data and dataset documentation
  • helpers/ — reusable utility code used by the notebook
  • models/ — saved model artifacts, configs, or checkpoints (if applicable)
  • reports/ — figures, tables, and exported evaluation outputs (if applicable)

Getting started

  1. Open notebooks/.
  2. Run the main notebook.
  3. Follow the notebook cells in order to reproduce preprocessing, feature generation, benchmark construction, training, and evaluation.

The notebook is the current reference implementation of the project workflow.

Scope

This repository focuses on binary GT1 enzyme–acceptor reactivity prediction, with evaluation across enzyme novelty, substrate novelty, and joint enzyme–substrate novelty settings.

Releases

No releases published

Packages

 
 
 

Contributors