TPTT is a modular Python library designed to inject efficient linearized attention (LiZA) mechanisms-such as Memory as Gate (described in Titans)-into pretrained transformers π€.
- Flexible Attention Injection: Seamlessly wrap and augment standard Transformer attention layers with linearized attention variants for latent memory.
- Support for Linear Attention: Includes implementations of DeltaNet and DeltaProduct with optional recurrent nonlinearity between chunks.
- Modular Design: Easily extend or customize operators and integration strategies.
- Compatibility: Designed to integrate with Hugging Face Transformers and similar PyTorch models.
- Low-Compute Alignment: Requires only lightweight fine-tuning after injection, enabling efficient memory integration without heavy retraining.
Important
After injecting the LiZA module, the model requires fine-tuning to properly align and effectively utilize the memory mechanism.
Note: The Order 2
Delta-Productattention mechanism is equally expressive as Titans.
pip install tptt-
TPTT-LiZA_Training:
Instructions for training TPTT-based models with LoRA and advanced memory management. -
TPTT_LiZA_Evaluation:
Guide for evaluating language models with LightEval and Hugging Face Transformers. -
TPTT_LiZA_FromScratch:
Integrating theLinearAttentionmodule into Pytorch deep learning projects.
Basic usage :
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
import tptt
from tptt import save_tptt_safetensors, get_tptt_model, load_tptt_safetensors
from torch import nn
##### Transforming into Titans (Tptt)
base_model_path = "Qwen/Qwen2.5-1.5B"
base_config = AutoConfig.from_pretrained(base_model_path)
base_model_name = "Qwen/Qwen2.5-1.5B"
tptt_config = tptt.TpttConfig(
base_model_config=base_config,
base_model_name= base_model_name,
#lora_config=lora_config,
)
model = tptt.TpttModel(config)
# manual local save
save_tptt_safetensors(model, path, name)
##### Pretrained Titans from Transformer
repo_id = "ffurfaro/Titans-Llama-3.2-1B"
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)
##### More custom for other Model (BERT, ViT, etc.)
model, linear_cache = get_tptt_model(model, config) # you can activate Bidirectional
model = load_tptt_safetensors(repo_or_path, model) # from saved LoRA only
##### Using LinearAttention from scratch
layers = nn.ModuleList([
tptt.LinearAttention(hidden_dim=64, num_heads=4,)
for _ in range(num_layers)])Some scripts are available here
More dΓ©tails in paper.
- Code is organized into modular components under the
src/tpttdirectory. - Use
pytestfor testing andsphinxfor documentation. See on this linkπ₯ - Contributions and feature requests are welcome!
- Python 3.11+
- PyTorch
- einops
- Transformers
- Peft
See requirements.txt for the full list.
Build and run TPTT with Docker:
# Build the image
docker build -t tptt .
# Run training (with GPU support)
docker run -it --gpus all \
-v $(pwd)/data:/data \
-v $(pwd)/outputs:/outputs \
tptt python -m train \
--model_name "meta-llama/Llama-3.2-1B" \
--method delta_rule \
--mag_weight 0.5
For more details, see the Dockerfile.
Discovering the OpenSparseLLMs/Linearization (π linear-flash-attention-based) project inspired this work and motivated me to create a fully modular, Delta-rule style PyTorch version.
If you use TPTT in your academic work, please cite:
@article{furfaro2025tptt,
title={TPTT: Transforming Pretrained Transformers into Titans},
author={Furfaro, Fabien},
journal={arXiv preprint arXiv:2506.17671},
year={2025}
}For questions or support, please open an issue on the GitHub repository or contact the maintainer.