<<<<<<< HEAD
=======
Advanced machine learning models for protein property prediction and analysis, with a focus on G protein-coupled receptors (GPCRs) and opsins.
- Transformer-based architectures for sequence and structure processing
- Specialized models for property prediction (activation, ligand binding, etc.)
- Comprehensive training utilities including loss functions, optimizers, and evaluation metrics
- Configuration system for reproducible experiments
- Model visualization tools for interpretability and analysis
# Install in development mode
git clone [repository-url]
cd lambda
pip install -e .The lambda package provides several architecture components that can be combined to create custom models:
# Encoders
from lambda.models.encoders import Encoder, TransformerEncoder
# Pooling layers
from lambda.models.layers import AttentionPooling, MaxPooling, MeanPooling
# Output layers
from lambda.models.layers import MLPHead, ClassificationHead, RegressionHeadModels can be created directly or through the model factory:
from lambda.models.factory import ModelFactory
# Create a model using the factory
model = ModelFactory.create_model(
model_type="transformer",
embedding_dim=768,
n_layers=6,
n_heads=8,
dropout=0.1,
output_dim=1
)
# Use the model for prediction
import torch
embeddings = torch.randn(10, 100, 768) # (batch_size, seq_len, emb_dim)
output = model(embeddings)The package includes a comprehensive training infrastructure:
from lambda.training import ModelTrainer
from lambda.data import ProteinDataset
# Create datasets
train_dataset = ProteinDataset(...)
val_dataset = ProteinDataset(...)
# Initialize trainer
trainer = ModelTrainer(
model=model,
loss_fn="mse",
optimizer="adam",
lr=1e-4,
weight_decay=1e-5,
device="cuda"
)
# Train the model
trainer.train(
train_dataset=train_dataset,
val_dataset=val_dataset,
batch_size=32,
epochs=10,
patience=3
)Models can be used for prediction on new data:
from lambda.inference import Predictor
# Load a trained model
predictor = Predictor.from_checkpoint("artifacts/weights/model_007061_weights.pth")
# Make predictions
results = predictor.predict(test_dataset)The lambda package uses a configuration system for reproducible experiments:
from lambda.configs import ExperimentManager
from lambda.models.factory import ModelFactory
# Load configuration
manager = ExperimentManager()
config = manager.load_config("007061")
# Create a model from configuration
model = ModelFactory.create_model_from_config(config)Example configuration file (config_007061.ini):
[MODEL]
model_type = transformer
embedding_dim = 768
n_layers = 6
n_heads = 8
dropout = 0.1
output_dim = 1
[TRAINING]
batch_size = 32
epochs = 10
learning_rate = 1e-4
weight_decay = 1e-5
patience = 3
[DATA]
dataset = opsin
split_seed = 42
train_ratio = 0.8
val_ratio = 0.1
test_ratio = 0.1The package includes visualization tools for model interpretability:
from lambda.visualization import visualize_attention
# Visualize attention weights
visualize_attention(
model=model,
sequence="MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA",
output_file="visualizations/attention_weights.png"
)The Lambda package follows a clean, modular organization:
lambda/ # Project root
├── lambda/ # Main package
│ ├── __init__.py # Package initialization
│ ├── cli/ # Command line interface
│ │ └── __init__.py
│ ├── configs/ # Configuration management
│ │ ├── __init__.py
│ │ ├── config_schema.py # Config validation schemas
│ │ ├── experiment_manager.py
│ │ └── defaults/ # Default configurations
│ │ ├── __init__.py
│ │ ├── model_defaults.py
│ │ └── training_defaults.py
│ ├── data/ # Data handling
│ │ ├── __init__.py
│ │ ├── dataset.py # Dataset definitions
│ │ └── dataset_utils.py # Data utilities
│ ├── models/ # Model definitions
│ │ ├── __init__.py
│ │ ├── factory.py # Model factory
│ │ ├── encoders/ # Encoder architectures
│ │ │ ├── __init__.py
│ │ │ └── base_encoder.py
│ │ └── layers/ # Model layers
│ │ ├── __init__.py
│ │ ├── losses.py
│ │ ├── output_layers.py
│ │ └── pooling.py
│ ├── training/ # Training infrastructure
│ │ ├── __init__.py
│ │ ├── model_trainer.py # Trainer class
│ │ ├── train.py # Training scripts
│ │ └── training_utils.py
│ ├── inference/ # Inference utilities
│ │ ├── __init__.py
│ │ └── predictor.py # Predictor class
│ ├── visualization/ # Visualization tools
│ │ ├── __init__.py
│ │ └── attention_visualization.py
│ └── utils/ # Utilities
│ ├── __init__.py
│ ├── attention_utils.py
│ ├── logging.py
│ └── model_utils.py
├── artifacts/ # Generated artifacts
│ ├── weights/ # Saved model weights
│ └── logs/ # Training logs
├── setup.py # Package setup
├── pyproject.toml # Package metadata
└── requirements.txt # Dependencies
Lambda provides command-line tools for common tasks:
# Run model training
python -m lambda.cli.train --config 007061
# Run model prediction
python -m lambda.cli.predict --model artifacts/weights/model_007061_weights.pth --input data/test.csvLambda models support a variety of configuration options:
transformer: Transformer-based sequence encodercnn: Convolutional neural network encoderhybrid: Combined transformer and CNN architecture
mse: Mean squared errorbce: Binary cross-entropyce: Cross-entropyfocal: Focal losscontrastive: Contrastive loss
adam: Adam optimizeradamw: AdamW optimizersgd: Stochastic gradient descentrmsprop: RMSprop optimizer
- Python 3.8+
- PyTorch (>=1.9.0)
- NumPy, Pandas
- scikit-learn
- Matplotlib
- tqdm
- h5py
- protos (>=0.1.0)
MIT License
a8aa8f8 (update)