δ-mem: Efficient Online Memory for Large Language Models

δ-mem introduces a compact Online State of Associative Memory alongside a frozen full-attention backbone. When a new token or interaction segment arrives, the model projects the current information into a low-dimensional memory space and writes it into the state through delta-rule learning.

This repository contains the main δ-mem implementation, training scripts, evaluation scripts, and an interactive chat demo. The current public release focuses on Qwen3-4B/8B and SmolLM3-3B experiments with three write strategies: TSW, SSW, and MSW.

Why δ-mem?

In long-term agent scenarios, what is truly needed is a more efficient memory mechanism. Such a mechanism should not endlessly increase the context burden like full-text retrieval, nor should it behave like static parametric memory that becomes fixed after training. Instead, it should be able to update dynamically during interaction and directly influence the model’s internal computation during inference. Motivated by this, we propose δ-mem, a lightweight online memory mechanism for large language models.

Released Model

Model	Base model	Adapter	Hugging Face
δ-mem Qwen3-4B Instruct TSW	`Qwen/Qwen3-4B-Instruct-2507`	rank-8 Q/O TSW, write length 8192	`declare-lab/delta-mem_qwen3_4b-instruct`

What Is In This Repository?

Delta-Mem/
├── data/
│   └── locomo10.json                     # local LoCoMo sample file used by scripts
├── deltamem/
│   ├── core/                             # Delta-Mem modules, config, adapter save/load
│   ├── demo/                             # interactive chat demo
│   ├── eval/                             # LoCoMo, HotpotQA, IFEval, GPQA, MemoryAgentBench
│   ├── kernels/                          # affine scan kernel wrapper
│   ├── runtime/                          # chat/session runtime
│   ├── tests/                            # regression tests
│   ├── tools/                            # TPS and inspection tools
│   └── train/                            # SFT training code
├── scripts/
│   ├── setup_uv_env.sh
│   ├── run_qasper_multimodel_write8192_train_and_benchmark_suite.sh
│   ├── run_qasper_multimodel_write8192_benchmark_suite.sh
│   ├── run_qasper_multimodel_write8192_*_qwen3_8b.sh
│   ├── run_qasper_multimodel_write8192_*_smollm3_3b.sh
│   └── run_generation_tps_benchmark.sh
└── deepspeed_zero2.json

Environment Setup

System Requirements

Recommended setup:

Component	Recommendation
Python	3.10 or newer
GPU	NVIDIA GPU for training/evaluation
CUDA/PyTorch	A CUDA-enabled PyTorch build matching your driver
Package manager	`uv`

The training scripts are designed for bf16 GPU runs and use FlashAttention and DeepSpeed. CPU-only usage is not the target path for this release.

One-Command Setup

Clone the repository and run the setup script:

git clone https://github.com/declare-lab/delta-Mem.git
cd delta-Mem
bash scripts/setup_uv_env.sh

The script creates a fresh .venv/, installs requirements.txt, installs FlashAttention with --no-build-isolation, and prints a short import/CUDA diagnostic at the end.

If uv is not installed:

python -m pip install uv

Activate the environment:

source .venv/bin/activate

Setup Options

Use a specific Python executable:

PYTHON_BIN=python3.11 bash scripts/setup_uv_env.sh

Keep an existing .venv/ instead of recreating it:

KEEP_VENV=1 bash scripts/setup_uv_env.sh

Skip FlashAttention reinstall if your cluster already provides a working build:

INSTALL_FLASH_ATTN=0 bash scripts/setup_uv_env.sh

Manual Setup

If you prefer to manage the environment yourself:

python -m pip install uv
uv venv --python python3.11 .venv
source .venv/bin/activate
uv pip install --upgrade pip setuptools wheel
uv pip install -r requirements.txt
uv pip install --no-build-isolation flash-attn

If PyTorch needs to be installed from a specific CUDA index, install it before the requirements, for example:

uv pip install torch --index-url https://download.pytorch.org/whl/cu124
uv pip install -r requirements.txt

Verify The Environment

Run:

python - <<'PY'
import torch, transformers, datasets, accelerate, deepspeed, flash_attn, peft
print("torch:", torch.__version__)
print("cuda:", torch.cuda.is_available())
print("transformers:", transformers.__version__)
print("datasets:", datasets.__version__)
print("deepspeed:", deepspeed.__version__)
print("flash_attn:", flash_attn.__file__)
print("peft:", peft.__version__)
PY

Then run the local checks:

PYTHONPATH=. python -m compileall -q deltamem
PYTHONPATH=. python -m pytest -q deltamem/tests

Path Configuration

The experiment scripts intentionally use placeholder paths under /root/...:

/root/huggingface
/root/models
/root/data
/root/outputs
/root/external/MemoryAgentBench

Before running training or evaluation, either edit the script variables or override them from the shell:

BASE_MODEL_PATH=/path/to/Qwen3-4B-Instruct-2507 \
TSW_ADAPTER_DIR=/path/to/delta-mem-adapter \
SUITE_ROOT=/path/to/results \
bash scripts/run_qasper_multimodel_write8192_benchmark_suite.sh

Use The Released Adapter

Download the adapter from Hugging Face:

huggingface-cli download declare-lab/delta-mem_qwen3_4b-instruct \
  --local-dir ./delta-mem_qwen3_4b-instruct

Minimal loading example:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

from deltamem.core import HFDeltaMemConfig, attach_delta_mem, load_delta_mem_adapter

base_model = "Qwen/Qwen3-4B-Instruct-2507"
adapter_dir = "./delta-mem_qwen3_4b-instruct"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

config = HFDeltaMemConfig.from_pretrained(adapter_dir)
attach_delta_mem(model, config)
load_delta_mem_adapter(model, adapter_dir)
model.eval()

δ-mem adapters are not standard PEFT LoRA adapters and are not merged into the base model with merge_and_unload(). The runtime memory read/write path is part of the model execution.

Chat Demo

Run the default shell wrapper:

bash deltamem/demo/run_chat_demo.sh

Typical override:

MODEL_PATH=/path/to/Qwen3-4B-Instruct-2507 \
ADAPTER_DIR=/path/to/delta-mem_qwen3_4b-instruct \
bash deltamem/demo/run_chat_demo.sh

Run the base model without δ-mem:

MODE=base MODEL_PATH=/path/to/Qwen3-4B-Instruct-2507 \
bash deltamem/demo/run_chat_demo.sh

Training

The main Qwen3-4B training script trains SSW, TSW, and MSW variants by default:

bash scripts/run_qasper_multimodel_write8192_train_and_benchmark_suite.sh

Run only TSW:

TRAIN_VARIANTS_STRING="TSW_rank8_qasper_write8192" \
BENCHMARK_VARIANTS_STRING="TSW_rank8_qasper_write8192" \
bash scripts/run_qasper_multimodel_write8192_train_and_benchmark_suite.sh

Model-specific scripts:

bash scripts/run_qasper_multimodel_write8192_train_and_benchmark_suite_qwen3_8b.sh
bash scripts/run_qasper_multimodel_write8192_train_and_benchmark_suite_smollm3_3b.sh

Evaluation

The main benchmark suite covers:

Benchmark	Entry
LoCoMo	`deltamem.eval.locomo_delta`
HotpotQA	`deltamem.eval.benchmark_compare --tasks hotpotqa`
IFEval	`deltamem.eval.benchmark_compare --tasks ifeval`
GPQA Diamond	`deltamem.eval.benchmark_compare --tasks gpqa_diamond`
MemoryAgentBench	`deltamem.eval.benchmark_compare --tasks memory_agent_bench`

Run the bundled Qwen3-4B benchmark suite:

bash scripts/run_qasper_multimodel_write8192_benchmark_suite.sh

Run only the TSW adapter and skip base-model evaluation:

BENCHMARK_VARIANTS_STRING="TSW_rank8_qasper_write8192" \
EVAL_TASKS_STRING="locomo hotpotqa gpqa_diamond ifeval memory_agent_bench" \
bash scripts/run_qasper_multimodel_write8192_benchmark_suite.sh

Citation

If you find our work is useful, please kindly cite:

@misc{lei2026deltamemefficientonlinememory,
      title={$\delta$-mem: Efficient Online Memory for Large Language Models}, 
      author={Jingdi Lei and Di Zhang and Junxian Li and Weida Wang and Kaixuan Fan and Xiang Liu and Qihan Liu and Xiaoteng Ma and Baian Chen and Soujanya Poria},
      year={2026},
      eprint={2605.12357},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.12357}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

δ-mem: Efficient Online Memory for Large Language Models

Why δ-mem?

Released Model

What Is In This Repository?

Environment Setup

System Requirements

One-Command Setup

Setup Options

Manual Setup

Verify The Environment

Path Configuration

Use The Released Adapter

Chat Demo

Training

Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
deltamem		deltamem
scripts		scripts
.gitignore		.gitignore
README.md		README.md
deepspeed_zero2.json		deepspeed_zero2.json
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

δ-mem: Efficient Online Memory for Large Language Models

Why δ-mem?

Released Model

What Is In This Repository?

Environment Setup

System Requirements

One-Command Setup

Setup Options

Manual Setup

Verify The Environment

Path Configuration

Use The Released Adapter

Chat Demo

Training

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages