δ-mem introduces a compact Online State of Associative Memory alongside a frozen full-attention backbone. When a new token or interaction segment arrives, the model projects the current information into a low-dimensional memory space and writes it into the state through delta-rule learning.
This repository contains the main δ-mem implementation, training scripts, evaluation scripts, and an interactive chat demo. The current public release focuses on Qwen3-4B/8B and SmolLM3-3B experiments with three write strategies: TSW, SSW, and MSW.
In long-term agent scenarios, what is truly needed is a more efficient memory mechanism. Such a mechanism should not endlessly increase the context burden like full-text retrieval, nor should it behave like static parametric memory that becomes fixed after training. Instead, it should be able to update dynamically during interaction and directly influence the model’s internal computation during inference. Motivated by this, we propose δ-mem, a lightweight online memory mechanism for large language models.
| Model | Base model | Adapter | Hugging Face |
|---|---|---|---|
| δ-mem Qwen3-4B Instruct TSW | Qwen/Qwen3-4B-Instruct-2507 |
rank-8 Q/O TSW, write length 8192 | declare-lab/delta-mem_qwen3_4b-instruct |
Delta-Mem/
├── data/
│ └── locomo10.json # local LoCoMo sample file used by scripts
├── deltamem/
│ ├── core/ # Delta-Mem modules, config, adapter save/load
│ ├── demo/ # interactive chat demo
│ ├── eval/ # LoCoMo, HotpotQA, IFEval, GPQA, MemoryAgentBench
│ ├── kernels/ # affine scan kernel wrapper
│ ├── runtime/ # chat/session runtime
│ ├── tests/ # regression tests
│ ├── tools/ # TPS and inspection tools
│ └── train/ # SFT training code
├── scripts/
│ ├── setup_uv_env.sh
│ ├── run_qasper_multimodel_write8192_train_and_benchmark_suite.sh
│ ├── run_qasper_multimodel_write8192_benchmark_suite.sh
│ ├── run_qasper_multimodel_write8192_*_qwen3_8b.sh
│ ├── run_qasper_multimodel_write8192_*_smollm3_3b.sh
│ └── run_generation_tps_benchmark.sh
└── deepspeed_zero2.json
Recommended setup:
| Component | Recommendation |
|---|---|
| Python | 3.10 or newer |
| GPU | NVIDIA GPU for training/evaluation |
| CUDA/PyTorch | A CUDA-enabled PyTorch build matching your driver |
| Package manager | uv |
The training scripts are designed for bf16 GPU runs and use FlashAttention and DeepSpeed. CPU-only usage is not the target path for this release.
Clone the repository and run the setup script:
git clone https://github.com/declare-lab/delta-Mem.git
cd delta-Mem
bash scripts/setup_uv_env.shThe script creates a fresh .venv/, installs requirements.txt, installs FlashAttention with --no-build-isolation, and prints a short import/CUDA diagnostic at the end.
If uv is not installed:
python -m pip install uvActivate the environment:
source .venv/bin/activateUse a specific Python executable:
PYTHON_BIN=python3.11 bash scripts/setup_uv_env.shKeep an existing .venv/ instead of recreating it:
KEEP_VENV=1 bash scripts/setup_uv_env.shSkip FlashAttention reinstall if your cluster already provides a working build:
INSTALL_FLASH_ATTN=0 bash scripts/setup_uv_env.shIf you prefer to manage the environment yourself:
python -m pip install uv
uv venv --python python3.11 .venv
source .venv/bin/activate
uv pip install --upgrade pip setuptools wheel
uv pip install -r requirements.txt
uv pip install --no-build-isolation flash-attnIf PyTorch needs to be installed from a specific CUDA index, install it before the requirements, for example:
uv pip install torch --index-url https://download.pytorch.org/whl/cu124
uv pip install -r requirements.txtRun:
python - <<'PY'
import torch, transformers, datasets, accelerate, deepspeed, flash_attn, peft
print("torch:", torch.__version__)
print("cuda:", torch.cuda.is_available())
print("transformers:", transformers.__version__)
print("datasets:", datasets.__version__)
print("deepspeed:", deepspeed.__version__)
print("flash_attn:", flash_attn.__file__)
print("peft:", peft.__version__)
PYThen run the local checks:
PYTHONPATH=. python -m compileall -q deltamem
PYTHONPATH=. python -m pytest -q deltamem/testsThe experiment scripts intentionally use placeholder paths under /root/...:
/root/huggingface
/root/models
/root/data
/root/outputs
/root/external/MemoryAgentBench
Before running training or evaluation, either edit the script variables or override them from the shell:
BASE_MODEL_PATH=/path/to/Qwen3-4B-Instruct-2507 \
TSW_ADAPTER_DIR=/path/to/delta-mem-adapter \
SUITE_ROOT=/path/to/results \
bash scripts/run_qasper_multimodel_write8192_benchmark_suite.shDownload the adapter from Hugging Face:
huggingface-cli download declare-lab/delta-mem_qwen3_4b-instruct \
--local-dir ./delta-mem_qwen3_4b-instructMinimal loading example:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from deltamem.core import HFDeltaMemConfig, attach_delta_mem, load_delta_mem_adapter
base_model = "Qwen/Qwen3-4B-Instruct-2507"
adapter_dir = "./delta-mem_qwen3_4b-instruct"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
device_map="auto",
)
config = HFDeltaMemConfig.from_pretrained(adapter_dir)
attach_delta_mem(model, config)
load_delta_mem_adapter(model, adapter_dir)
model.eval()δ-mem adapters are not standard PEFT LoRA adapters and are not merged into the base model with merge_and_unload(). The runtime memory read/write path is part of the model execution.
Run the default shell wrapper:
bash deltamem/demo/run_chat_demo.shTypical override:
MODEL_PATH=/path/to/Qwen3-4B-Instruct-2507 \
ADAPTER_DIR=/path/to/delta-mem_qwen3_4b-instruct \
bash deltamem/demo/run_chat_demo.shRun the base model without δ-mem:
MODE=base MODEL_PATH=/path/to/Qwen3-4B-Instruct-2507 \
bash deltamem/demo/run_chat_demo.shThe main Qwen3-4B training script trains SSW, TSW, and MSW variants by default:
bash scripts/run_qasper_multimodel_write8192_train_and_benchmark_suite.shRun only TSW:
TRAIN_VARIANTS_STRING="TSW_rank8_qasper_write8192" \
BENCHMARK_VARIANTS_STRING="TSW_rank8_qasper_write8192" \
bash scripts/run_qasper_multimodel_write8192_train_and_benchmark_suite.shModel-specific scripts:
bash scripts/run_qasper_multimodel_write8192_train_and_benchmark_suite_qwen3_8b.sh
bash scripts/run_qasper_multimodel_write8192_train_and_benchmark_suite_smollm3_3b.shThe main benchmark suite covers:
| Benchmark | Entry |
|---|---|
| LoCoMo | deltamem.eval.locomo_delta |
| HotpotQA | deltamem.eval.benchmark_compare --tasks hotpotqa |
| IFEval | deltamem.eval.benchmark_compare --tasks ifeval |
| GPQA Diamond | deltamem.eval.benchmark_compare --tasks gpqa_diamond |
| MemoryAgentBench | deltamem.eval.benchmark_compare --tasks memory_agent_bench |
Run the bundled Qwen3-4B benchmark suite:
bash scripts/run_qasper_multimodel_write8192_benchmark_suite.shRun only the TSW adapter and skip base-model evaluation:
BENCHMARK_VARIANTS_STRING="TSW_rank8_qasper_write8192" \
EVAL_TASKS_STRING="locomo hotpotqa gpqa_diamond ifeval memory_agent_bench" \
bash scripts/run_qasper_multimodel_write8192_benchmark_suite.shIf you find our work is useful, please kindly cite:
@misc{lei2026deltamemefficientonlinememory,
title={$\delta$-mem: Efficient Online Memory for Large Language Models},
author={Jingdi Lei and Di Zhang and Junxian Li and Weida Wang and Kaixuan Fan and Xiang Liu and Qihan Liu and Xiaoteng Ma and Baian Chen and Soujanya Poria},
year={2026},
eprint={2605.12357},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2605.12357},
}