10 releases

Uses new Rust 2024

0.3.3 Mar 26, 2026
0.3.2 Mar 17, 2026
0.2.2 Mar 14, 2026
0.1.5 Mar 12, 2026

#879 in Artificial intelligence


Used in 4 crates

Apache-2.0

59KB
1K SLoC

mem7

mem7

LLM-powered long-term memory engine — Rust core with multi-language bindings.

Deeply inspired by Mem0, mem7 reimplements the core memory pipeline in Rust and goes further with two capabilities Mem0 doesn't have:

  • Ebbinghaus forgetting curve — stale memories naturally decay over time while frequently recalled facts grow stronger, just like human memory.
  • Session-aware recall — each memory is typed (factual / preference / procedural / episodic) and each query is auto-classified by task intent, so irrelevant memories (e.g. design preferences during bug-fixing) are demoted before they reach the agent.

mem7 extracts factual statements from conversations, deduplicates them against existing memories, and stores the results in vector + graph databases with full audit history.

Install

pip install mem7          # Python
npm install @mem7ai/mem7  # Node.js / TypeScript
cargo add mem7            # Rust

Architecture

Python / TypeScript / Rust API
    │  PyO3 (sync + async) / napi-rs / native
    ▼
Rust Core (tokio async runtime)
    ├── mem7-llm        — OpenAI-compatible LLM client
    ├── mem7-embedding  — Embedding client (OpenAI-compatible / FastEmbed)
    ├── mem7-vector     — Vector index (FlatIndex / Upstash)
    ├── mem7-graph      — Graph store (FlatGraph / Kuzu / Neo4j)
    ├── mem7-history    — SQLite audit trail
    ├── mem7-dedup      — LLM-driven memory deduplication
    ├── mem7-reranker   — Search reranking (Cohere / LLM-based)
    ├── mem7-telemetry  — OpenTelemetry tracing (OTLP export)
    └── mem7-store      — Pipeline orchestrator (MemoryEngine)

Write Path — add()

flowchart LR
    A[Conversation] --> B["LLM: extract facts\n+ memory_type"]
    A --> C["LLM: extract\ngraph relations"]
    B --> D[Embed facts]
    D --> E["Search existing\nmemories"]
    E --> F["LLM: dedup\n(ADD / UPDATE / DELETE)"]
    F --> G[(Vector Index)]
    C --> H[(Graph Store)]
    F --> I[(SQLite History)]
flowchart LR
    Q[Query] --> E[Embed query]
    Q --> CL["LLM: classify\ntask_type"]
    E --> V["Vector search"]
    E --> G["Graph search"]
    V --> RR["Rerank\n(optional)"]
    RR --> DC["× decay"]
    DC --> CT["× context_coeff\n(memory_type, task_type)"]
    CL -.-> CT
    G --> CT
    CT --> TH["Threshold\nfilter"]
    TH --> R[Ranked results]

Quick Start (Python — Sync)

from mem7 import Memory
from mem7.config import MemoryConfig, LlmConfig, EmbeddingConfig

config = MemoryConfig(
    llm=LlmConfig(
        base_url="http://localhost:11434/v1",
        api_key="ollama",
        model="qwen2.5:7b",
    ),
    embedding=EmbeddingConfig(
        base_url="http://localhost:11434/v1",
        api_key="ollama",
        model="mxbai-embed-large",
        dims=1024,
    ),
)

m = Memory(config=config)
m.add("I love playing tennis and my coach is Sarah.", user_id="alice")
results = m.search("What sports does Alice play?", user_id="alice")

Quick Start (Python — Async)

import asyncio
from mem7 import AsyncMemory
from mem7.config import MemoryConfig, LlmConfig, EmbeddingConfig

async def main():
    config = MemoryConfig(
        llm=LlmConfig(
            base_url="http://localhost:11434/v1",
            api_key="ollama",
            model="qwen2.5:7b",
        ),
        embedding=EmbeddingConfig(
            base_url="http://localhost:11434/v1",
            api_key="ollama",
            model="mxbai-embed-large",
            dims=1024,
        ),
    )

    m = await AsyncMemory.create(config=config)
    await m.add("I love playing tennis and my coach is Sarah.", user_id="alice")
    results = await m.search("What sports does Alice play?", user_id="alice")

asyncio.run(main())

Quick Start (TypeScript)

import { MemoryEngine } from "@mem7ai/mem7";

const engine = await MemoryEngine.create(JSON.stringify({
  llm: { base_url: "http://localhost:11434/v1", api_key: "ollama", model: "qwen2.5:7b" },
  embedding: { base_url: "http://localhost:11434/v1", api_key: "ollama", model: "mxbai-embed-large", dims: 1024 },
}));

await engine.add([{ role: "user", content: "I love playing tennis and my coach is Sarah." }], "alice");
const results = await engine.search("What sports does Alice play?", "alice");

Supported Providers

mem7 uses a single OpenAI-compatible client for both LLM and Embedding, which covers any service that exposes the OpenAI API format. This includes most major providers out of the box.

LLMs

Provider Status Notes
OpenAI Native support
Ollama Via OpenAI-compatible API
vLLM Via OpenAI-compatible API
Groq Via OpenAI-compatible API
Together Via OpenAI-compatible API
DeepSeek Via OpenAI-compatible API
xAI (Grok) Via OpenAI-compatible API
LM Studio Via OpenAI-compatible API
Azure OpenAI Via OpenAI-compatible API
Anthropic Requires native SDK
Gemini Requires native SDK
Vertex AI Requires native SDK
AWS Bedrock Requires native SDK
LiteLLM Python proxy
Sarvam Requires native SDK
LangChain Python framework

Embeddings

Provider Status Notes
OpenAI Native support
Ollama Via OpenAI-compatible API
Together Via OpenAI-compatible API
LM Studio Via OpenAI-compatible API
Azure OpenAI Via OpenAI-compatible API
FastEmbed Local ONNX inference (feature flag fastembed)
Hugging Face Requires native SDK
Gemini Requires native SDK
Vertex AI Requires native SDK
AWS Bedrock Requires native SDK
LangChain Python framework

Vector Stores

Provider Status Notes
In-memory (FlatIndex) Built-in, good for dev
Upstash Vector REST API, serverless
Qdrant
Chroma
pgvector
Milvus
Pinecone
Redis
Weaviate
Elasticsearch
OpenSearch
FAISS
MongoDB
Supabase
Azure AI Search
Vertex AI Vector Search
Databricks
Cassandra
S3 Vectors
Baidu
Neptune
Valkey
LangChain

Rerankers

Provider Status Notes
Cohere Cohere v2 rerank API
LLM-based Any OpenAI-compatible LLM
Jina AI Planned
Cross-encoder Planned

Graph Stores

Provider Status Notes
In-memory (FlatGraph) Built-in, good for dev/testing
Kuzu (embedded) Cypher-based, no server needed (feature flag kuzu)
Neo4j Production-grade, Bolt protocol
Memgraph Planned
Amazon Neptune Planned

Language Bindings

Language Status
Python (sync + async) ✅ PyPI: pip install mem7
TypeScript / Node.js ✅ npm: npm install @mem7ai/mem7
Rust ✅ crates.io: cargo add mem7
Go Planned

Vector Store Backends

Built-in FlatIndex (default) — in-memory brute-force, good for development:

from mem7.config import VectorConfig

VectorConfig(provider="flat", dims=1024)

Upstash Vector — managed cloud vector database:

VectorConfig(
    provider="upstash",
    collection_name="my-namespace",
    dims=1024,
    upstash_url="https://your-index.upstash.io",
    upstash_token="your-token",
)

Local Embedding (FastEmbed)

mem7 supports fully local embedding via FastEmbed (ONNX Runtime). No API calls needed — models are downloaded and run locally.

Requires the fastembed feature flag:

# Cargo.toml
mem7 = { version = "0.3.3", features = ["fastembed"] }
from mem7 import Memory
from mem7.config import MemoryConfig, LlmConfig, EmbeddingConfig

config = MemoryConfig(
    llm=LlmConfig(base_url="http://localhost:11434/v1", api_key="ollama", model="qwen2.5:7b"),
    embedding=EmbeddingConfig(
        provider="fastembed",
        model="AllMiniLML6V2",  # or "BGEBaseENV15", "NomicEmbedTextV15", etc.
        dims=384,
    ),
)

m = Memory(config=config)  # model downloaded on first use

Supported models include AllMiniLML6V2, BGEBaseENV15, BGESmallENV15, NomicEmbedTextV1, MxbaiEmbedLargeV1, GTEBaseENV15, and their quantized variants.

Graph Memory (Dual-Path Recall)

When graph is configured, mem7 runs dual-path recall: vector search and graph search execute concurrently via tokio::join!, returning both factual memories and entity relations.

On add(), the engine extracts entities and relations from conversations using LLM (JSON mode) and stores them in the graph alongside the vector memories.

FlatGraph (in-memory, for development):

from mem7 import Memory
from mem7.config import MemoryConfig, LlmConfig, EmbeddingConfig, GraphConfig

config = MemoryConfig(
    llm=LlmConfig(base_url="http://localhost:11434/v1", api_key="ollama", model="qwen2.5:7b"),
    embedding=EmbeddingConfig(base_url="http://localhost:11434/v1", api_key="ollama", model="mxbai-embed-large", dims=1024),
    graph=GraphConfig(provider="flat"),
)

m = Memory(config=config)
m.add("I love playing tennis and my coach is Sarah.", user_id="alice")

results = m.search("What sports does Alice play?", user_id="alice")
# results["memories"]   -> vector search results
# results["relations"]  -> graph relations (e.g. USER -[loves_playing]-> tennis)

Neo4j (production):

GraphConfig(
    provider="neo4j",
    neo4j_url="bolt://localhost:7687",
    neo4j_username="neo4j",
    neo4j_password="password",
)

Kuzu (embedded, requires kuzu feature flag):

GraphConfig(provider="kuzu", kuzu_db_path="./my_graph.kuzu")

The graph LLM can be configured separately (e.g. use a cheaper model for extraction):

GraphConfig(
    provider="flat",
    llm=LlmConfig(base_url="http://localhost:11434/v1", api_key="ollama", model="qwen2.5:3b"),
)

Memory Decay (Forgetting Curve)

mem7 implements an Ebbinghaus-inspired forgetting curve that deprioritizes stale memories over time while automatically strengthening memories that are frequently recalled — just like human memory.

When enabled, every memory carries two extra metadata fields: last_accessed_at (the last time it was written or retrieved) and access_count (how many times it has been retrieved). These are used to compute a retention score that modulates the raw similarity score during search and dedup:

$$S = S_0 \cdot \bigl(1 + \alpha \cdot \ln(1 + n)\bigr)$$

$$R(t) = \exp!\Bigl(-\Bigl(\frac{t - \tau}{S}\Bigr)^{!\gamma}\Bigr)$$

$$\widetilde{R}(t) = \rho + (1 - \rho) \cdot R(t)$$

$$\text{score}{\text{final}} = \text{sim}{\text{raw}} \times \widetilde{R}(t)$$

where $S_0$ = base half-life, $\alpha$ = rehearsal factor, $n$ = access count, $\tau$ = last accessed time, $\gamma$ = decay shape, $\rho$ = min retention floor.

  • Decay over time: memories you haven't touched in weeks get deprioritized, but never disappear (the floor parameter ensures a minimum retention of 10% by default).
  • Rehearsal strengthening: each time a memory is successfully retrieved via search(), its access_count is incremented and last_accessed_at is reset asynchronously — making it harder to forget next time.
  • Cue-dependent retrieval: a highly relevant query naturally "wakes up" old memories because raw_similarity is high, even if the retention score is low. No separate sigmoid gate is needed — the multiplicative structure handles it.
  • Write-path aware: decay is also applied during the dedup phase of add(), so stale memories appear less "close" to new facts and are more likely to be updated or replaced.

Enabling Decay

Decay is off by default. Enable it via config:

Python:

from mem7.config import MemoryConfig, DecayConfig

config = MemoryConfig(
    # ... llm, embedding, etc.
    decay=DecayConfig(enabled=True),
)

TypeScript:

const engine = await MemoryEngine.create(JSON.stringify({
  // ... llm, embedding, etc.
  decay: { enabled: true },
}));

Rust:

use mem7_config::{MemoryEngineConfig, DecayConfig};

let config = MemoryEngineConfig {
    decay: Some(DecayConfig { enabled: true, ..Default::default() }),
    ..Default::default()
};

Tuning Parameters

Parameter Default Description
base_half_life_secs 604800.0 Base stability in seconds (7 days) before any rehearsal bonus
decay_shape 0.8 Stretched-exponential shape (0 < gamma <= 1); lower = slower initial decay
min_retention 0.1 Floor so no memory fully vanishes
rehearsal_factor 0.5 How much each retrieval increases stability

Backward Compatibility

  • Old memories without last_accessed_at or access_count gracefully degrade: age falls back to updated_at then created_at, and access count defaults to 0.
  • No migration needed — new fields are written on the next add() or update() call.
  • When decay is disabled (the default), scoring behavior is identical to previous versions.

Context-Aware Scoring (Session-Aware Recall)

Pure embedding similarity can conflate semantic closeness with contextual relevance — for example, a design preference like "always investigate root cause first" may score high when searching "fix Chrome CDP bug" because both relate to debugging. With context-aware scoring, mem7 automatically classifies queries and memories to boost what's relevant and demote what isn't.

How It Works

  1. Write path — each extracted fact is tagged with a memory_type (factual, preference, procedural, episodic) during LLM fact extraction.
  2. Read path — each search query is classified into a task_type (troubleshooting, design, factual_lookup, planning, general) via a lightweight LLM call that runs in parallel with embedding, adding zero sequential latency.
  3. A context coefficient is looked up from a (memory_type, task_type) weight matrix and multiplied into the score:

$$\text{score}_{\text{final}} = \text{similarity} \times \text{decay} \times \text{context coeff}$$

Default Weight Matrix

troubleshooting design factual_lookup planning general
factual 1.0 0.5 1.0 0.7 1.0
preference 0.3 1.0 0.3 0.8 0.8
procedural 0.8 0.5 0.5 1.0 0.7
episodic 0.5 0.5 0.5 0.5 0.7

Enabling Context-Aware Scoring

Context scoring is off by default. Enable it via config:

Python:

from mem7.config import MemoryConfig, ContextConfig

config = MemoryConfig(
    # ... llm, embedding, etc.
    context=ContextConfig(enabled=True),
)

TypeScript:

const engine = await MemoryEngine.create(JSON.stringify({
  // ... llm, embedding, etc.
  context: { enabled: true },
}));

Rust:

use mem7_config::{MemoryEngineConfig, ContextConfig};

let config = MemoryEngineConfig {
    context: Some(ContextConfig { enabled: true, ..Default::default() }),
    ..Default::default()
};

You can also provide custom weights to override the defaults:

ContextConfig(
    enabled=True,
    weights={
        "preference": {"troubleshooting": 0.1, "design": 1.0},
    },
)

Overriding Task Type

If the caller already knows the task context, it can pass task_type directly to skip the LLM classification call:

results = m.search("fix Chrome CDP timeout", user_id="alice", task_type="troubleshooting")

Backward Compatibility

  • Context scoring defaults to disabled — zero impact on existing users.
  • Old memories without memory_type are treated as "factual" (safe default).
  • When context is disabled, the scoring pipeline is identical to previous versions.

OpenClaw Plugin

mem7 ships an official OpenClaw memory plugin that replaces the built-in memory backend with LLM-powered fact extraction, graph relations, dedup, and the forgetting curve — all driven by mem7's Rust core.

Install

openclaw plugins install @mem7ai/openclaw-mem7

Activate

In ~/.openclaw/openclaw.json:

{
  "plugins": {
    "slots": { "memory": "openclaw-mem7" },
    "entries": {
      "openclaw-mem7": {
        "enabled": true,
        "config": {
          "llm": { "base_url": "http://localhost:11434/v1", "api_key": "ollama", "model": "qwen2.5:7b" },
          "embedding": { "base_url": "http://localhost:11434/v1", "api_key": "ollama", "model": "mxbai-embed-large", "dims": 1024 },
          "graph": { "provider": "flat" },
          "decay": { "enabled": true }
        }
      }
    }
  }
}

What it does

  • Auto-recall (before_prompt_build / before_agent_start): before each agent turn, the plugin searches both session and long-term scopes, merges the results, and injects them into the system prompt.
  • Auto-capture (agent_end): after each turn, the user + assistant messages are sent through mem7's fact extraction pipeline, automatically storing new facts and deduplicating against existing ones.
  • Tools: the plugin registers memory_search, memory_get, memory_list, memory_store, and memory_forget for explicit memory operations.
  • Scope model: tools support session, long-term, and merged all reads, with sessionKey automatically mapped onto runId and optional agentId.
  • Forgetting curve: decay is enabled by default so stale facts naturally fade, while frequently recalled memories stay strong.

See packages/openclaw-mem7/ for full documentation.

Observability (OpenTelemetry)

mem7 integrates with OpenTelemetry via tracing-opentelemetry. When enabled, every add(), search(), get(), update(), delete() call emits a trace span that is exported via OTLP/gRPC to any compatible collector (Jaeger, Grafana Tempo, Datadog, etc.).

Python:

from mem7 import Memory, init_telemetry, shutdown_telemetry

init_telemetry(otlp_endpoint="http://localhost:4317", service_name="my-app")

m = Memory(config=config)
m.add("I love playing tennis.", user_id="alice")
# spans are exported automatically

shutdown_telemetry()  # flush before exit

TypeScript:

import { MemoryEngine, initTelemetry, shutdownTelemetry } from "@mem7ai/mem7";

initTelemetry(JSON.stringify({ otlp_endpoint: "http://localhost:4317", service_name: "my-app" }));

const engine = await MemoryEngine.create(configJson);
await engine.add([{ role: "user", content: "I love tennis." }], "alice");

shutdownTelemetry();

Rust (requires otel feature):

// Cargo.toml: mem7 = { version = "0.3.3", features = ["otel"] }
use mem7::{TelemetryConfig, telemetry};

telemetry::init(&TelemetryConfig::default())?;
// ... use MemoryEngine as usual ...
telemetry::shutdown();

Examples

See the examples/ directory:

Development

Prerequisites

  • Rust 1.85+ (stable)
  • Python 3.10+
  • Node.js 22+
  • just
  • maturin

Build

python -m venv .venv && source .venv/bin/activate
pip install maturin pydantic

# Development build (debug, fast iteration)
just dev

# Release build
just build

# OpenClaw plugin build
just openclaw-build

Test

# Full validation suite
just check

# Common individual tasks
just fmt
just fmt-check
just clippy
just lint
just typecheck
just test

License

Apache-2.0

Dependencies

~41MB
~678K SLoC