Skip to content

flurinh/lambda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

<<<<<<< HEAD

lambda

=======

Lambda

Advanced machine learning models for protein property prediction and analysis, with a focus on G protein-coupled receptors (GPCRs) and opsins.

Features

  • Transformer-based architectures for sequence and structure processing
  • Specialized models for property prediction (activation, ligand binding, etc.)
  • Comprehensive training utilities including loss functions, optimizers, and evaluation metrics
  • Configuration system for reproducible experiments
  • Model visualization tools for interpretability and analysis

Installation

# Install in development mode
git clone [repository-url]
cd lambda
pip install -e .

Core Components

Model Architecture

The lambda package provides several architecture components that can be combined to create custom models:

# Encoders
from lambda.models.encoders import Encoder, TransformerEncoder

# Pooling layers
from lambda.models.layers import AttentionPooling, MaxPooling, MeanPooling

# Output layers
from lambda.models.layers import MLPHead, ClassificationHead, RegressionHead

Model Creation

Models can be created directly or through the model factory:

from lambda.models.factory import ModelFactory

# Create a model using the factory
model = ModelFactory.create_model(
    model_type="transformer",
    embedding_dim=768,
    n_layers=6,
    n_heads=8,
    dropout=0.1,
    output_dim=1
)

# Use the model for prediction
import torch
embeddings = torch.randn(10, 100, 768)  # (batch_size, seq_len, emb_dim)
output = model(embeddings)

Training

The package includes a comprehensive training infrastructure:

from lambda.training import ModelTrainer
from lambda.data import ProteinDataset

# Create datasets
train_dataset = ProteinDataset(...)
val_dataset = ProteinDataset(...)

# Initialize trainer
trainer = ModelTrainer(
    model=model,
    loss_fn="mse",
    optimizer="adam",
    lr=1e-4,
    weight_decay=1e-5,
    device="cuda"
)

# Train the model
trainer.train(
    train_dataset=train_dataset,
    val_dataset=val_dataset,
    batch_size=32,
    epochs=10,
    patience=3
)

Prediction

Models can be used for prediction on new data:

from lambda.inference import Predictor

# Load a trained model
predictor = Predictor.from_checkpoint("artifacts/weights/model_007061_weights.pth")

# Make predictions
results = predictor.predict(test_dataset)

Configuration System

The lambda package uses a configuration system for reproducible experiments:

from lambda.configs import ExperimentManager
from lambda.models.factory import ModelFactory

# Load configuration
manager = ExperimentManager()
config = manager.load_config("007061")

# Create a model from configuration
model = ModelFactory.create_model_from_config(config)

Example configuration file (config_007061.ini):

[MODEL]
model_type = transformer
embedding_dim = 768
n_layers = 6
n_heads = 8
dropout = 0.1
output_dim = 1

[TRAINING]
batch_size = 32
epochs = 10
learning_rate = 1e-4
weight_decay = 1e-5
patience = 3

[DATA]
dataset = opsin
split_seed = 42
train_ratio = 0.8
val_ratio = 0.1
test_ratio = 0.1

Visualization

The package includes visualization tools for model interpretability:

from lambda.visualization import visualize_attention

# Visualize attention weights
visualize_attention(
    model=model,
    sequence="MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA",
    output_file="visualizations/attention_weights.png"
)

Project Structure

The Lambda package follows a clean, modular organization:

lambda/                       # Project root
├── lambda/                  # Main package
│   ├── __init__.py          # Package initialization
│   ├── cli/                 # Command line interface
│   │   └── __init__.py
│   ├── configs/             # Configuration management
│   │   ├── __init__.py
│   │   ├── config_schema.py # Config validation schemas
│   │   ├── experiment_manager.py
│   │   └── defaults/        # Default configurations
│   │       ├── __init__.py
│   │       ├── model_defaults.py
│   │       └── training_defaults.py
│   ├── data/                # Data handling
│   │   ├── __init__.py
│   │   ├── dataset.py       # Dataset definitions
│   │   └── dataset_utils.py # Data utilities
│   ├── models/              # Model definitions
│   │   ├── __init__.py
│   │   ├── factory.py       # Model factory
│   │   ├── encoders/        # Encoder architectures
│   │   │   ├── __init__.py
│   │   │   └── base_encoder.py
│   │   └── layers/          # Model layers
│   │       ├── __init__.py
│   │       ├── losses.py
│   │       ├── output_layers.py
│   │       └── pooling.py
│   ├── training/            # Training infrastructure
│   │   ├── __init__.py
│   │   ├── model_trainer.py # Trainer class
│   │   ├── train.py         # Training scripts
│   │   └── training_utils.py
│   ├── inference/           # Inference utilities
│   │   ├── __init__.py
│   │   └── predictor.py     # Predictor class
│   ├── visualization/       # Visualization tools
│   │   ├── __init__.py
│   │   └── attention_visualization.py
│   └── utils/               # Utilities
│       ├── __init__.py
│       ├── attention_utils.py
│       ├── logging.py
│       └── model_utils.py
├── artifacts/               # Generated artifacts
│   ├── weights/             # Saved model weights
│   └── logs/                # Training logs
├── setup.py                 # Package setup
├── pyproject.toml           # Package metadata
└── requirements.txt         # Dependencies

CLI Usage

Lambda provides command-line tools for common tasks:

# Run model training
python -m lambda.cli.train --config 007061

# Run model prediction
python -m lambda.cli.predict --model artifacts/weights/model_007061_weights.pth --input data/test.csv

Configuration Options

Lambda models support a variety of configuration options:

Model Types

  • transformer: Transformer-based sequence encoder
  • cnn: Convolutional neural network encoder
  • hybrid: Combined transformer and CNN architecture

Loss Functions

  • mse: Mean squared error
  • bce: Binary cross-entropy
  • ce: Cross-entropy
  • focal: Focal loss
  • contrastive: Contrastive loss

Optimizers

  • adam: Adam optimizer
  • adamw: AdamW optimizer
  • sgd: Stochastic gradient descent
  • rmsprop: RMSprop optimizer

Dependencies

  • Python 3.8+
  • PyTorch (>=1.9.0)
  • NumPy, Pandas
  • scikit-learn
  • Matplotlib
  • tqdm
  • h5py
  • protos (>=0.1.0)

License

MIT License

a8aa8f8 (update)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors