Learning to Continually Learn via Meta-learning Agentic Memory Designs

Overview

This project introduces ALMA (Automated meta-Learning of Memory designs for Agentic systems), a framework that meta-learns memory designs to replace hand-engineered memory designs, therefore minimizing human effort and enabling agentic systems to be continual learners across diverse domains. ALMA employs a Meta Agent that searches over memory designs expressed as executable code in an open-ended manner, theoretically allowing the discovery of arbitrary memory designs, including database schemas as well as their retrieval and update mechanisms.

Open-ended Exploration Process of ALMA.

The Meta Agent first ideates and proposes a plan by reflecting on the code and evaluation logs of the sampled memory design. It then implements the plan by programming the new design in code. Finally, it verifies the correctness of the new memory design and evaluates it with an agentic system. The evaluated memory design is subsequently added to the memory design archive for future sampling.

The learned memory designs and logs required to give the results presented in our work are available at: Learning Memory Designs & Logs

Key Features

🧠 Automatic Memory Design Discovery - ALMA learns memory designs rather than hand-engineered designs
🎯 Domain Adaptation - Automatically specializes memory designs for diverse sequential decision-making tasks
🔬 Comprehensive Evaluation - Tested across four domains: AlfWorld, TextWorld, BabaisAI, and MiniHack
📈 Superior Performance - Outperforms state-of-the-art human-designed baselines across all benchmarks
⚡ Cost Efficiency - Learned designs are more efficient than most human-designed baselines

Setup

# Cloning project
git clone https://github.com/zksha/alma.git
cd ./alma

# Create environment
conda create -n alma python=3.11
conda activate alma

# Install dependencies
pip install -r requirements.txt

# Then add your API key to the `.env` file:
# .env
OPENAI_API_KEY=your_openai_api_key_here

Running Experiments

Setup Testing Environments

Warning

This repository executes model-generated code as part of the memory design search process. While the code goes through a verification and debugging stage, dynamically generated code may behave unpredictably. Use at your own risk, ideally inside a sandboxed or isolated environment.

For ALFWorld

# Build ALFWorld Docker image
cd envs_docker/alfworld
bash image_build.sh

For TextWorld, BabaisAI, and MiniHack (BALROG)

# Build BALROG Docker image (used for TextWorld, BabaisAI, and MiniHack)
cd envs_docker/BALROG
bash image_build.sh

Tip

The BALROG image is shared across TextWorld, BabaisAI, and MiniHack domains, so you only need to build it once.

Learning of Memory Designs

To run the learning process that discovers new memory designs:

python run_main.py \
    --rollout_type batched \
    --meta_model gpt-5 \
    --execution_model gpt-5-nano \
    --batch_max_update_concurrent 10 \
    --batch_max_retrieve_concurrent 10 \
    --task_type alfworld \
    --status train \
    --train_size 30

Parameters:

Parameter	Description	Options
`--rollout_type`	Execution strategy for evaluations, `sequential` allows both update and retrieval in deployment phase	`batched`, `sequential`
`--meta_model`	Model used by the meta agent to propose memory designs	`gpt-5`, `gpt-4.1`, etc.
`--execution_model`	Model used by agents during task execution	`gpt-5-mini/medium`, `gpt-5-nano/low`, `gpt-4o-mini`, etc.
`--batch_max_update_concurrent`	Max concurrent memory update operations	Integer (e.g., `10`)
`--batch_max_retrieve_concurrent`	Max concurrent memory retrieval operations	Integer (e.g., `10`)
`--task_type`	Domain to run experiments on	`alfworld`, `textworld`, `babaisai`, `minihack`
`--status`	Execution mode	`train`, `eval_in_distribution`, `eval_out_of_distribution`
`--train_size`	Number of training tasks	Integer (e.g., `30`, `50`, `100`)
`--memo_SHA`	Memory designs' SHA, provided for testing	String (e.g., `g-memory`, `53cee295`)

Tip

Example configurations for different domains is in training.sh and testing.sh. Learned memory design should be store in memo_archive. Learning logs should be store in logs.

Adding New Domains

To extend the benchmark to a new domain:

Build up image for new domain.
Adding prompts, configs, and {env_name}_envs.py in envs archive.
Adding task descriptions for meta agent in meta_agent_prompt.py.
Register container and name for the new benchmark in eval_in_container.py.
Run the meta agent to discover specialized memory designs.
Evaluate results against baseline memory designs.

Results

Our learned memory designs consistently outperform state-of-the-art human-designed memory across all benchmarks.
Numbers indicate overall success rate in percentage (higher is better). Improvements are relative to the no-memory baseline.

FM in Agentic System	GPT-5-nano / low	GPT-5-mini / medium
No Memory	6.1	41.1
Manual Memory Designs
Trajectory Retrieval	8.6 (+2.5)	48.6 (+7.5)
Reasoning Bank	7.5 (+1.4)	40.1 (−1.0)
Dynamic Cheatsheet	7.2 (+1.1)	46.5 (+5.4)
G-Memory	7.7 (+1.6)	46.0 (+4.9)
Learned Memory Design
Our Method	12.3 (+6.2)	53.9 (+12.8)

Key findings:

Learned designs adapt to domain-specific requirements automatically
Better performance scaling with memory size
Faster learning under task distribution shifts
Lower computational costs compared to human-designed baselines

Acknowledgements

This research was supported by the Vector Institute, the Canada CIFAR AI Chairs program, a grant from Schmidt Futures, an NSERC Discovery Grant, and a generous donation from Rafael Cosman. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute (https://vectorinstitute.ai/partnerships/current-partners/). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
core		core
envs_archive		envs_archive
envs_docker		envs_docker
evals		evals
memo_archive/baseline		memo_archive/baseline
misc		misc
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_in_container.py		eval_in_container.py
requirements.txt		requirements.txt
run_main.py		run_main.py
testing.sh		testing.sh
training.sh		training.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning to Continually Learn via Meta-learning Agentic Memory Designs

Overview

Key Features

Setup

Running Experiments

Setup Testing Environments

For ALFWorld

For TextWorld, BabaisAI, and MiniHack (BALROG)

Learning of Memory Designs

Adding New Domains

Results

Acknowledgements

License

About

Uh oh!

Releases

Packages

Languages

License

zksha/alma

Folders and files

Latest commit

History

Repository files navigation

Learning to Continually Learn via Meta-learning Agentic Memory Designs

Overview

Key Features

Setup

Running Experiments

Setup Testing Environments

For ALFWorld

For TextWorld, BabaisAI, and MiniHack (BALROG)

Learning of Memory Designs

Adding New Domains

Results

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages