mem7-llm

10 releases

Uses new Rust 2024

0.3.3	Mar 26, 2026
0.3.2	Mar 17, 2026
0.2.2	Mar 14, 2026
0.1.5	Mar 12, 2026

#879 in Artificial intelligence

Used in 4 crates

Apache-2.0

59KB
1K SLoC

mem7

LLM-powered long-term memory engine — Rust core with multi-language bindings.

Deeply inspired by Mem0, mem7 reimplements the core memory pipeline in Rust and goes further with two capabilities Mem0 doesn't have:

Ebbinghaus forgetting curve — stale memories naturally decay over time while frequently recalled facts grow stronger, just like human memory.
Session-aware recall — each memory is typed (factual / preference / procedural / episodic) and each query is auto-classified by task intent, so irrelevant memories (e.g. design preferences during bug-fixing) are demoted before they reach the agent.

mem7 extracts factual statements from conversations, deduplicates them against existing memories, and stores the results in vector + graph databases with full audit history.

Install

pip install mem7          # Python
npm install @mem7ai/mem7  # Node.js / TypeScript
cargo add mem7            # Rust

Architecture

Python / TypeScript / Rust API
    │  PyO3 (sync + async) / napi-rs / native
    ▼
Rust Core (tokio async runtime)
    ├── mem7-llm        — OpenAI-compatible LLM client
    ├── mem7-embedding  — Embedding client (OpenAI-compatible / FastEmbed)
    ├── mem7-vector     — Vector index (FlatIndex / Upstash)
    ├── mem7-graph      — Graph store (FlatGraph / Kuzu / Neo4j)
    ├── mem7-history    — SQLite audit trail
    ├── mem7-dedup      — LLM-driven memory deduplication
    ├── mem7-reranker   — Search reranking (Cohere / LLM-based)
    ├── mem7-telemetry  — OpenTelemetry tracing (OTLP export)
    └── mem7-store      — Pipeline orchestrator (MemoryEngine)

Write Path — `add()`

flowchart LR
    A[Conversation] --> B["LLM: extract facts\n+ memory_type"]
    A --> C["LLM: extract\ngraph relations"]
    B --> D[Embed facts]
    D --> E["Search existing\nmemories"]
    E --> F["LLM: dedup\n(ADD / UPDATE / DELETE)"]
    F --> G[(Vector Index)]
    C --> H[(Graph Store)]
    F --> I[(SQLite History)]

Read Path — `search()`

flowchart LR
    Q[Query] --> E[Embed query]
    Q --> CL["LLM: classify\ntask_type"]
    E --> V["Vector search"]
    E --> G["Graph search"]
    V --> RR["Rerank\n(optional)"]
    RR --> DC["× decay"]
    DC --> CT["× context_coeff\n(memory_type, task_type)"]
    CL -.-> CT
    G --> CT
    CT --> TH["Threshold\nfilter"]
    TH --> R[Ranked results]

Quick Start (Python — Sync)

from mem7 import Memory
from mem7.config import MemoryConfig, LlmConfig, EmbeddingConfig

config = MemoryConfig(
    llm=LlmConfig(
        base_url="http://localhost:11434/v1",
        api_key="ollama",
        model="qwen2.5:7b",
    ),
    embedding=EmbeddingConfig(
        base_url="http://localhost:11434/v1",
        api_key="ollama",
        model="mxbai-embed-large",
        dims=1024,
    ),
)

m = Memory(config=config)
m.add("I love playing tennis and my coach is Sarah.", user_id="alice")
results = m.search("What sports does Alice play?", user_id="alice")

Quick Start (Python — Async)

import asyncio
from mem7 import AsyncMemory
from mem7.config import MemoryConfig, LlmConfig, EmbeddingConfig

async def main():
    config = MemoryConfig(
        llm=LlmConfig(
            base_url="http://localhost:11434/v1",
            api_key="ollama",
            model="qwen2.5:7b",
        ),
        embedding=EmbeddingConfig(
            base_url="http://localhost:11434/v1",
            api_key="ollama",
            model="mxbai-embed-large",
            dims=1024,
        ),
    )

    m = await AsyncMemory.create(config=config)
    await m.add("I love playing tennis and my coach is Sarah.", user_id="alice")
    results = await m.search("What sports does Alice play?", user_id="alice")

asyncio.run(main())

Quick Start (TypeScript)

import { MemoryEngine } from "@mem7ai/mem7";

const engine = await MemoryEngine.create(JSON.stringify({
  llm: { base_url: "http://localhost:11434/v1", api_key: "ollama", model: "qwen2.5:7b" },
  embedding: { base_url: "http://localhost:11434/v1", api_key: "ollama", model: "mxbai-embed-large", dims: 1024 },
}));

await engine.add([{ role: "user", content: "I love playing tennis and my coach is Sarah." }], "alice");
const results = await engine.search("What sports does Alice play?", "alice");

Supported Providers

mem7 uses a single OpenAI-compatible client for both LLM and Embedding, which covers any service that exposes the OpenAI API format. This includes most major providers out of the box.

LLMs

Provider	Status	Notes
OpenAI	✅	Native support
Ollama	✅	Via OpenAI-compatible API
vLLM	✅	Via OpenAI-compatible API
Groq	✅	Via OpenAI-compatible API
Together	✅	Via OpenAI-compatible API
DeepSeek	✅	Via OpenAI-compatible API
xAI (Grok)	✅	Via OpenAI-compatible API
LM Studio	✅	Via OpenAI-compatible API
Azure OpenAI	✅	Via OpenAI-compatible API
Anthropic	❌	Requires native SDK
Gemini	❌	Requires native SDK
Vertex AI	❌	Requires native SDK
AWS Bedrock	❌	Requires native SDK
LiteLLM	❌	Python proxy
Sarvam	❌	Requires native SDK
LangChain	❌	Python framework

Embeddings

Provider	Status	Notes
OpenAI	✅	Native support
Ollama	✅	Via OpenAI-compatible API
Together	✅	Via OpenAI-compatible API
LM Studio	✅	Via OpenAI-compatible API
Azure OpenAI	✅	Via OpenAI-compatible API
FastEmbed	✅	Local ONNX inference (feature flag `fastembed`)
Hugging Face	❌	Requires native SDK
Gemini	❌	Requires native SDK
Vertex AI	❌	Requires native SDK
AWS Bedrock	❌	Requires native SDK
LangChain	❌	Python framework

Vector Stores

Provider	Status	Notes
In-memory (FlatIndex)	✅	Built-in, good for dev
Upstash Vector	✅	REST API, serverless
Qdrant	❌
Chroma	❌
pgvector	❌
Milvus	❌
Pinecone	❌
Redis	❌
Weaviate	❌
Elasticsearch	❌
OpenSearch	❌
FAISS	❌
MongoDB	❌
Supabase	❌
Azure AI Search	❌
Vertex AI Vector Search	❌
Databricks	❌
Cassandra	❌
S3 Vectors	❌
Baidu	❌
Neptune	❌
Valkey	❌
LangChain	❌

Rerankers

Provider	Status	Notes
Cohere	✅	Cohere v2 rerank API
LLM-based	✅	Any OpenAI-compatible LLM
Jina AI	❌	Planned
Cross-encoder	❌	Planned

Graph Stores

Provider	Status	Notes
In-memory (FlatGraph)	✅	Built-in, good for dev/testing
Kuzu (embedded)	✅	Cypher-based, no server needed (feature flag `kuzu`)
Neo4j	✅	Production-grade, Bolt protocol
Memgraph	❌	Planned
Amazon Neptune	❌	Planned

Language Bindings

Language	Status
Python (sync + async)	✅ PyPI: `pip install mem7`
TypeScript / Node.js	✅ npm: `npm install @mem7ai/mem7`
Rust	✅ crates.io: `cargo add mem7`
Go	Planned

Vector Store Backends

Built-in FlatIndex (default) — in-memory brute-force, good for development:

from mem7.config import VectorConfig

VectorConfig(provider="flat", dims=1024)

Upstash Vector — managed cloud vector database:

VectorConfig(
    provider="upstash",
    collection_name="my-namespace",
    dims=1024,
    upstash_url="https://your-index.upstash.io",
    upstash_token="your-token",
)

Local Embedding (FastEmbed)

mem7 supports fully local embedding via FastEmbed (ONNX Runtime). No API calls needed — models are downloaded and run locally.

Requires the fastembed feature flag:

# Cargo.toml
mem7 = { version = "0.3.3", features = ["fastembed"] }

from mem7 import Memory
from mem7.config import MemoryConfig, LlmConfig, EmbeddingConfig

config = MemoryConfig(
    llm=LlmConfig(base_url="http://localhost:11434/v1", api_key="ollama", model="qwen2.5:7b"),
    embedding=EmbeddingConfig(
        provider="fastembed",
        model="AllMiniLML6V2",  # or "BGEBaseENV15", "NomicEmbedTextV15", etc.
        dims=384,
    ),
)

m = Memory(config=config)  # model downloaded on first use

Supported models include AllMiniLML6V2, BGEBaseENV15, BGESmallENV15, NomicEmbedTextV1, MxbaiEmbedLargeV1, GTEBaseENV15, and their quantized variants.

Graph Memory (Dual-Path Recall)

When graph is configured, mem7 runs dual-path recall: vector search and graph search execute concurrently via tokio::join!, returning both factual memories and entity relations.

On add(), the engine extracts entities and relations from conversations using LLM (JSON mode) and stores them in the graph alongside the vector memories.

FlatGraph (in-memory, for development):

from mem7 import Memory
from mem7.config import MemoryConfig, LlmConfig, EmbeddingConfig, GraphConfig

config = MemoryConfig(
    llm=LlmConfig(base_url="http://localhost:11434/v1", api_key="ollama", model="qwen2.5:7b"),
    embedding=EmbeddingConfig(base_url="http://localhost:11434/v1", api_key="ollama", model="mxbai-embed-large", dims=1024),
    graph=GraphConfig(provider="flat"),
)

m = Memory(config=config)
m.add("I love playing tennis and my coach is Sarah.", user_id="alice")

results = m.search("What sports does Alice play?", user_id="alice")
# results["memories"]   -> vector search results
# results["relations"]  -> graph relations (e.g. USER -[loves_playing]-> tennis)

Neo4j (production):

GraphConfig(
    provider="neo4j",
    neo4j_url="bolt://localhost:7687",
    neo4j_username="neo4j",
    neo4j_password="password",
)

Kuzu (embedded, requires kuzu feature flag):

GraphConfig(provider="kuzu", kuzu_db_path="./my_graph.kuzu")

The graph LLM can be configured separately (e.g. use a cheaper model for extraction):

GraphConfig(
    provider="flat",
    llm=LlmConfig(base_url="http://localhost:11434/v1", api_key="ollama", model="qwen2.5:3b"),
)

Memory Decay (Forgetting Curve)

mem7 implements an Ebbinghaus-inspired forgetting curve that deprioritizes stale memories over time while automatically strengthening memories that are frequently recalled — just like human memory.

When enabled, every memory carries two extra metadata fields: last_accessed_at (the last time it was written or retrieved) and access_count (how many times it has been retrieved). These are used to compute a retention score that modulates the raw similarity score during search and dedup:

$$S = S_0 \cdot \bigl(1 + \alpha \cdot \ln(1 + n)\bigr)$$

$$R(t) = \exp!\Bigl(-\Bigl(\frac{t - \tau}{S}\Bigr)^{!\gamma}\Bigr)$$

$$\widetilde{R}(t) = \rho + (1 - \rho) \cdot R(t)$$

$$\text{score}{\text{final}} = \text{sim}{\text{raw}} \times \widetilde{R}(t)$$

where $S_0$ = base half-life, $\alpha$ = rehearsal factor, $n$ = access count, $\tau$ = last accessed time, $\gamma$ = decay shape, $\rho$ = min retention floor.

Decay over time: memories you haven't touched in weeks get deprioritized, but never disappear (the floor parameter ensures a minimum retention of 10% by default).
Rehearsal strengthening: each time a memory is successfully retrieved via search(), its access_count is incremented and last_accessed_at is reset asynchronously — making it harder to forget next time.
Cue-dependent retrieval: a highly relevant query naturally "wakes up" old memories because raw_similarity is high, even if the retention score is low. No separate sigmoid gate is needed — the multiplicative structure handles it.
Write-path aware: decay is also applied during the dedup phase of add(), so stale memories appear less "close" to new facts and are more likely to be updated or replaced.

Enabling Decay

Decay is off by default. Enable it via config:

Python:

from mem7.config import MemoryConfig, DecayConfig

config = MemoryConfig(
    # ... llm, embedding, etc.
    decay=DecayConfig(enabled=True),
)

TypeScript:

const engine = await MemoryEngine.create(JSON.stringify({
  // ... llm, embedding, etc.
  decay: { enabled: true },
}));

Rust:

use mem7_config::{MemoryEngineConfig, DecayConfig};

let config = MemoryEngineConfig {
    decay: Some(DecayConfig { enabled: true, ..Default::default() }),
    ..Default::default()
};

Tuning Parameters

Parameter	Default	Description
`base_half_life_secs`	`604800.0`	Base stability in seconds (7 days) before any rehearsal bonus
`decay_shape`	`0.8`	Stretched-exponential shape (0 < gamma <= 1); lower = slower initial decay
`min_retention`	`0.1`	Floor so no memory fully vanishes
`rehearsal_factor`	`0.5`	How much each retrieval increases stability

Backward Compatibility

Old memories without last_accessed_at or access_count gracefully degrade: age falls back to updated_at then created_at, and access count defaults to 0.
No migration needed — new fields are written on the next add() or update() call.
When decay is disabled (the default), scoring behavior is identical to previous versions.

Context-Aware Scoring (Session-Aware Recall)

Pure embedding similarity can conflate semantic closeness with contextual relevance — for example, a design preference like "always investigate root cause first" may score high when searching "fix Chrome CDP bug" because both relate to debugging. With context-aware scoring, mem7 automatically classifies queries and memories to boost what's relevant and demote what isn't.

How It Works

Write path — each extracted fact is tagged with a memory_type (factual, preference, procedural, episodic) during LLM fact extraction.
Read path — each search query is classified into a task_type (troubleshooting, design, factual_lookup, planning, general) via a lightweight LLM call that runs in parallel with embedding, adding zero sequential latency.
A context coefficient is looked up from a (memory_type, task_type) weight matrix and multiplied into the score:

$$\text{score}_{\text{final}} = \text{similarity} \times \text{decay} \times \text{context coeff}$$

Default Weight Matrix

	troubleshooting	design	factual_lookup	planning	general
factual	1.0	0.5	1.0	0.7	1.0
preference	0.3	1.0	0.3	0.8	0.8
procedural	0.8	0.5	0.5	1.0	0.7
episodic	0.5	0.5	0.5	0.5	0.7

Enabling Context-Aware Scoring

Context scoring is off by default. Enable it via config:

Python:

from mem7.config import MemoryConfig, ContextConfig

config = MemoryConfig(
    # ... llm, embedding, etc.
    context=ContextConfig(enabled=True),
)

TypeScript:

const engine = await MemoryEngine.create(JSON.stringify({
  // ... llm, embedding, etc.
  context: { enabled: true },
}));

Rust:

use mem7_config::{MemoryEngineConfig, ContextConfig};

let config = MemoryEngineConfig {
    context: Some(ContextConfig { enabled: true, ..Default::default() }),
    ..Default::default()
};

You can also provide custom weights to override the defaults:

ContextConfig(
    enabled=True,
    weights={
        "preference": {"troubleshooting": 0.1, "design": 1.0},
    },
)

Overriding Task Type

If the caller already knows the task context, it can pass task_type directly to skip the LLM classification call:

results = m.search("fix Chrome CDP timeout", user_id="alice", task_type="troubleshooting")

Backward Compatibility

Context scoring defaults to disabled — zero impact on existing users.
Old memories without memory_type are treated as "factual" (safe default).
When context is disabled, the scoring pipeline is identical to previous versions.

OpenClaw Plugin

mem7 ships an official OpenClaw memory plugin that replaces the built-in memory backend with LLM-powered fact extraction, graph relations, dedup, and the forgetting curve — all driven by mem7's Rust core.

Install

openclaw plugins install @mem7ai/openclaw-mem7

Activate

In ~/.openclaw/openclaw.json:

{
  "plugins": {
    "slots": { "memory": "openclaw-mem7" },
    "entries": {
      "openclaw-mem7": {
        "enabled": true,
        "config": {
          "llm": { "base_url": "http://localhost:11434/v1", "api_key": "ollama", "model": "qwen2.5:7b" },
          "embedding": { "base_url": "http://localhost:11434/v1", "api_key": "ollama", "model": "mxbai-embed-large", "dims": 1024 },
          "graph": { "provider": "flat" },
          "decay": { "enabled": true }
        }
      }
    }
  }
}

What it does

Auto-recall (before_prompt_build / before_agent_start): before each agent turn, the plugin searches both session and long-term scopes, merges the results, and injects them into the system prompt.
Auto-capture (agent_end): after each turn, the user + assistant messages are sent through mem7's fact extraction pipeline, automatically storing new facts and deduplicating against existing ones.
Tools: the plugin registers memory_search, memory_get, memory_list, memory_store, and memory_forget for explicit memory operations.
Scope model: tools support session, long-term, and merged all reads, with sessionKey automatically mapped onto runId and optional agentId.
Forgetting curve: decay is enabled by default so stale facts naturally fade, while frequently recalled memories stay strong.

See packages/openclaw-mem7/ for full documentation.

Observability (OpenTelemetry)

mem7 integrates with OpenTelemetry via tracing-opentelemetry. When enabled, every add(), search(), get(), update(), delete() call emits a trace span that is exported via OTLP/gRPC to any compatible collector (Jaeger, Grafana Tempo, Datadog, etc.).

Python:

from mem7 import Memory, init_telemetry, shutdown_telemetry

init_telemetry(otlp_endpoint="http://localhost:4317", service_name="my-app")

m = Memory(config=config)
m.add("I love playing tennis.", user_id="alice")
# spans are exported automatically

shutdown_telemetry()  # flush before exit

TypeScript:

import { MemoryEngine, initTelemetry, shutdownTelemetry } from "@mem7ai/mem7";

initTelemetry(JSON.stringify({ otlp_endpoint: "http://localhost:4317", service_name: "my-app" }));

const engine = await MemoryEngine.create(configJson);
await engine.add([{ role: "user", content: "I love tennis." }], "alice");

shutdownTelemetry();

Rust (requires otel feature):

// Cargo.toml: mem7 = { version = "0.3.3", features = ["otel"] }
use mem7::{TelemetryConfig, telemetry};

telemetry::init(&TelemetryConfig::default())?;
// ... use MemoryEngine as usual ...
telemetry::shutdown();

Examples

See the examples/ directory:

mem7_demo.ipynb — Python notebook demo
mem7_demo.ts — TypeScript demo

Development

Prerequisites

Rust 1.85+ (stable)
Python 3.10+
Node.js 22+
just
maturin

Build

python -m venv .venv && source .venv/bin/activate
pip install maturin pydantic

# Development build (debug, fast iteration)
just dev

# Release build
just build

# OpenClaw plugin build
just openclaw-build

Test

# Full validation suite
just check

# Common individual tasks
just fmt
just fmt-check
just clippy
just lint
just typecheck
just test

License

Apache-2.0

Dependencies

~41MB
~678K SLoC

10 releases

mem7

Install

Architecture

Write Path — add()

Read Path — search()

Quick Start (Python — Sync)

Quick Start (Python — Async)

Quick Start (TypeScript)

Supported Providers

LLMs

Embeddings

Vector Stores

Rerankers

Graph Stores

Language Bindings

Vector Store Backends

Local Embedding (FastEmbed)

Graph Memory (Dual-Path Recall)

Memory Decay (Forgetting Curve)

Enabling Decay

Tuning Parameters

Backward Compatibility

Context-Aware Scoring (Session-Aware Recall)

How It Works

Default Weight Matrix

Enabling Context-Aware Scoring

Overriding Task Type

Backward Compatibility

OpenClaw Plugin

Install

Activate

What it does

Observability (OpenTelemetry)

Examples

Development

Prerequisites

Build

Test

License

Dependencies

Write Path — `add()`

Read Path — `search()`