Skip to content
/ alma Public

ALMA (Automated meta-Learning of Memory designs for Agentic systems) is a framework that meta-learns memory designs to replace human-engineered designs for agentic system.

License

Notifications You must be signed in to change notification settings

zksha/alma

Repository files navigation


Learning to Continually Learn via Meta-learning Agentic Memory Designs

Overview

This project introduces ALMA (Automated meta-Learning of Memory designs for Agentic systems), a framework that meta-learns memory designs to replace hand-engineered memory designs, therefore minimizing human effort and enabling agentic systems to be continual learners across diverse domains. ALMA employs a Meta Agent that searches over memory designs expressed as executable code in an open-ended manner, theoretically allowing the discovery of arbitrary memory designs, including database schemas as well as their retrieval and update mechanisms.


Open-ended Exploration Process of ALMA.

The Meta Agent first ideates and proposes a plan by reflecting on the code and evaluation logs of the sampled memory design. It then implements the plan by programming the new design in code. Finally, it verifies the correctness of the new memory design and evaluates it with an agentic system. The evaluated memory design is subsequently added to the memory design archive for future sampling.
The learned memory designs and logs required to give the results presented in our work are available at: Learning Memory Designs & Logs

Key Features

  • 🧠 Automatic Memory Design Discovery - ALMA learns memory designs rather than hand-engineered designs
  • 🎯 Domain Adaptation - Automatically specializes memory designs for diverse sequential decision-making tasks
  • 🔬 Comprehensive Evaluation - Tested across four domains: AlfWorld, TextWorld, BabaisAI, and MiniHack
  • 📈 Superior Performance - Outperforms state-of-the-art human-designed baselines across all benchmarks
  • Cost Efficiency - Learned designs are more efficient than most human-designed baselines

Setup

# Cloning project
git clone https://github.com/zksha/alma.git
cd ./alma

# Create environment
conda create -n alma python=3.11
conda activate alma

# Install dependencies
pip install -r requirements.txt

# Then add your API key to the `.env` file:
# .env
OPENAI_API_KEY=your_openai_api_key_here

Running Experiments

Setup Testing Environments

Warning

This repository executes model-generated code as part of the memory design search process. While the code goes through a verification and debugging stage, dynamically generated code may behave unpredictably. Use at your own risk, ideally inside a sandboxed or isolated environment.

For ALFWorld

# Build ALFWorld Docker image
cd envs_docker/alfworld
bash image_build.sh

For TextWorld, BabaisAI, and MiniHack (BALROG)

# Build BALROG Docker image (used for TextWorld, BabaisAI, and MiniHack)
cd envs_docker/BALROG
bash image_build.sh

Tip

The BALROG image is shared across TextWorld, BabaisAI, and MiniHack domains, so you only need to build it once.

Learning of Memory Designs

To run the learning process that discovers new memory designs:

python run_main.py \
    --rollout_type batched \
    --meta_model gpt-5 \
    --execution_model gpt-5-nano \
    --batch_max_update_concurrent 10 \
    --batch_max_retrieve_concurrent 10 \
    --task_type alfworld \
    --status train \
    --train_size 30

Parameters:

Parameter Description Options
--rollout_type Execution strategy for evaluations, sequential allows both update and retrieval in deployment phase batched, sequential
--meta_model Model used by the meta agent to propose memory designs gpt-5, gpt-4.1, etc.
--execution_model Model used by agents during task execution gpt-5-mini/medium, gpt-5-nano/low, gpt-4o-mini, etc.
--batch_max_update_concurrent Max concurrent memory update operations Integer (e.g., 10)
--batch_max_retrieve_concurrent Max concurrent memory retrieval operations Integer (e.g., 10)
--task_type Domain to run experiments on alfworld, textworld, babaisai, minihack
--status Execution mode train, eval_in_distribution, eval_out_of_distribution
--train_size Number of training tasks Integer (e.g., 30, 50, 100)
--memo_SHA Memory designs' SHA, provided for testing String (e.g., g-memory, 53cee295)

Tip

Example configurations for different domains is in training.sh and testing.sh. Learned memory design should be store in memo_archive. Learning logs should be store in logs.

Adding New Domains

To extend the benchmark to a new domain:

  1. Build up image for new domain.
  2. Adding prompts, configs, and {env_name}_envs.py in envs archive.
  3. Adding task descriptions for meta agent in meta_agent_prompt.py.
  4. Register container and name for the new benchmark in eval_in_container.py.
  5. Run the meta agent to discover specialized memory designs.
  6. Evaluate results against baseline memory designs.

Results

Our learned memory designs consistently outperform state-of-the-art human-designed memory across all benchmarks.
Numbers indicate overall success rate in percentage (higher is better). Improvements are relative to the no-memory baseline.

FM in Agentic System GPT-5-nano / low GPT-5-mini / medium
No Memory 6.1 41.1
Manual Memory Designs
Trajectory Retrieval 8.6 (+2.5) 48.6 (+7.5)
Reasoning Bank 7.5 (+1.4) 40.1 (−1.0)
Dynamic Cheatsheet 7.2 (+1.1) 46.5 (+5.4)
G-Memory 7.7 (+1.6) 46.0 (+4.9)
Learned Memory Design
Our Method 12.3 (+6.2) 53.9 (+12.8)

Key findings:

  • Learned designs adapt to domain-specific requirements automatically
  • Better performance scaling with memory size
  • Faster learning under task distribution shifts
  • Lower computational costs compared to human-designed baselines

Acknowledgements

This research was supported by the Vector Institute, the Canada CIFAR AI Chairs program, a grant from Schmidt Futures, an NSERC Discovery Grant, and a generous donation from Rafael Cosman. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute (https://vectorinstitute.ai/partnerships/current-partners/). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.


License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

About

ALMA (Automated meta-Learning of Memory designs for Agentic systems) is a framework that meta-learns memory designs to replace human-engineered designs for agentic system.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages