Graph-augmented vector retrieval for persistent conversational memory in LLM agents.
KeyMem stores raw conversational turns in a FalkorDB knowledge graph and retrieves them through a dual-path architecture combining keyword vector search with graph traversal. It is designed as a stateless gRPC service that any agent framework can integrate without coupling to a specific dialogue manager.
The Store Pipeline processes each turn through reference resolution, LLM extraction, and batch embedding before writing to the knowledge graph. The Recall Pipeline executes dual-path retrieval — Path B (keyword vector search + graph traversal) and Path C (Fragment multi-hop expansion) — followed by source-aware scoring.
Knowledge graph of a 5-turn conversation visualized in FalkorDB Browser.
| System | MultiHop | Temporal | OpenDomain | SingleHop | Adversarial | Overall F1 |
|---|---|---|---|---|---|---|
| mem0 | 0.262 | 0.080 | 0.149 | 0.266 | 0.861 | 0.372 |
| SimpleMem | 0.429 | 0.629 | 0.339 | 0.554 | 0.016 | 0.415 |
| KeyMem | 0.452 | 0.570 | 0.343 | 0.659 | 0.666 | 0.609 |
All systems use gpt-4.1-mini + text-embedding-3-small. top-k=30.
- Python ≥ 3.11
- FalkorDB running locally (default:
localhost:6379) - OpenAI-compatible API key
Start FalkorDB:
docker run -p 6379:6379 falkordb/falkordbgit clone https://github.com/your-username/keymem
cd keymem
./install.sh
source .venv/bin/activateThe install script creates an isolated virtual environment, installs all dependencies, and registers the keymem CLI command.
1. Start the server:
keymem serve \
--llm-api-key YOUR_KEY \
--embedding-api-key YOUR_KEY2. Use the Python client:
from keymem.client import KeyMemClient
mem = KeyMemClient("localhost:50051", session_id="user-123")
# Store conversation turns
mem.store("Do you have a pet?", "Yes, I have a cat named Pepper.")
mem.store("How old is she?", "She's 3 years old.") # "she" resolved to Pepper via reference state
# Reset reference state machine when context breaks (topic switch / out-of-order store)
mem.reset_state()
mem.store("What's your favorite food?", "I love spicy ramen.")
# Recall relevant memories
results = mem.recall("What does the user like to eat?")
for r in results:
print(r.question, "→", r.answer)
mem.close()| Document | Description |
|---|---|
| SDK Reference | KeyMemClient API — store, recall, forget, attention, session isolation |
| CLI Reference | keymem serve/stop/status/clean — all server commands and options |
| Example | Description |
|---|---|
| examples/chat-robot/chat.py | Terminal chat agent with long-term memory, attention stack, configurable context window, and streaming output |
- Raw memories over compressed summaries — stores original conversation text, not extracted facts
- No automatic forgetting — forgetting is an application-layer concern
- No automatic conflict resolution — temporal ordering is preserved; the LLM decides at query time
- Stateless interface — no coupling to session management; supports out-of-order store calls
MIT