KeyMem

Graph-augmented vector retrieval for persistent conversational memory in LLM agents.

KeyMem stores raw conversational turns in a FalkorDB knowledge graph and retrieves them through a dual-path architecture combining keyword vector search with graph traversal. It is designed as a stateless gRPC service that any agent framework can integrate without coupling to a specific dialogue manager.

Architecture

The Store Pipeline processes each turn through reference resolution, LLM extraction, and batch embedding before writing to the knowledge graph. The Recall Pipeline executes dual-path retrieval — Path B (keyword vector search + graph traversal) and Path C (Fragment multi-hop expansion) — followed by source-aware scoring.

Knowledge graph of a 5-turn conversation visualized in FalkorDB Browser.

Benchmark (LoCoMo-10)

System	MultiHop	Temporal	OpenDomain	SingleHop	Adversarial	Overall F1
mem0	0.262	0.080	0.149	0.266	0.861	0.372
SimpleMem	0.429	0.629	0.339	0.554	0.016	0.415
KeyMem	0.452	0.570	0.343	0.659	0.666	0.609

All systems use gpt-4.1-mini + text-embedding-3-small. top-k=30.

Requirements

Python ≥ 3.11
FalkorDB running locally (default: localhost:6379)
OpenAI-compatible API key

Start FalkorDB:

docker run -p 6379:6379 falkordb/falkordb

Installation

git clone https://github.com/your-username/keymem
cd keymem
./install.sh
source .venv/bin/activate

The install script creates an isolated virtual environment, installs all dependencies, and registers the keymem CLI command.

Quick Start

1. Start the server:

keymem serve \
  --llm-api-key YOUR_KEY \
  --embedding-api-key YOUR_KEY

2. Use the Python client:

from keymem.client import KeyMemClient

mem = KeyMemClient("localhost:50051", session_id="user-123")

# Store conversation turns
mem.store("Do you have a pet?", "Yes, I have a cat named Pepper.")
mem.store("How old is she?", "She's 3 years old.")  # "she" resolved to Pepper via reference state

# Reset reference state machine when context breaks (topic switch / out-of-order store)
mem.reset_state()
mem.store("What's your favorite food?", "I love spicy ramen.")

# Recall relevant memories
results = mem.recall("What does the user like to eat?")
for r in results:
    print(r.question, "→", r.answer)

mem.close()

Documentation

Document	Description
SDK Reference	`KeyMemClient` API — store, recall, forget, attention, session isolation
CLI Reference	`keymem serve/stop/status/clean` — all server commands and options

Examples

Example	Description
examples/chat-robot/chat.py	Terminal chat agent with long-term memory, attention stack, configurable context window, and streaming output

Key Design Principles

Raw memories over compressed summaries — stores original conversation text, not extracted facts
No automatic forgetting — forgetting is an application-layer concern
No automatic conflict resolution — temporal ordering is preserved; the LLM decides at query time
Stateless interface — no coupling to session management; supports out-of-order store calls

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
examples/chat-robot		examples/chat-robot
src/keymem		src/keymem
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KeyMem

Architecture

Benchmark (LoCoMo-10)

Requirements

Installation

Quick Start

Documentation

Examples

Key Design Principles

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KeyMem

Architecture

Benchmark (LoCoMo-10)

Requirements

Installation

Quick Start

Documentation

Examples

Key Design Principles

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages