Bitemporal memory Γ empirical tuning: the first self-improvement ledger for AI agents. Your agent's accountable past, in plain Markdown.
π LongMemEval 98.0% β #1 on open-source agent long-term memory benchmarks (Surpasses MemPalace 96.6%, MEMENTO by Microsoft 90.8% Β· LLM-as-judge Β· oracle 50 Β· 3-run semantic majority)
v2.7.0 Β· Zero-dependency compound knowledge system for AI agents. Auto-extract, classify, search, tune, and time-travel β all in plain Markdown. Debugging is memory. Time travel is memory. Multi-agent handoffs are memory. Facts have bitemporal validity. Memories decay reversibly. Wiki links build graphs. Tuning iterations leave an audit trail.
Plain Markdown source-of-truth Β· zero deps Β· zero keys Β· zero LLM calls inside MemKraft. In 30 seconds:
pipx install memkraft && memkraft init && memkraft agents-hint claude-code
- β‘ Quickstart (30s)
- π― Why MemKraft?
- π§© Features
- π³ Real-world Recipes
- π¬ Specialized Features
- π API Reference
- β¨οΈ CLI Reference
- ποΈ Architecture
- βοΈ Comparison
- π Reproducing LongMemEval
- π Staying Up To Date
- π Changelog
- π€ Contributing
- π License
- π Appendix: Inspirations & Credits
pip install memkraft
memkraft init # β creates ./memory/ with RESOLVER, TEMPLATES, entities/, ...
memkraft agents-hint claude-code >> AGENTS.md # your agent is now memory-awarememkraft init --template claude-code # CLAUDE.md + memory/ + examples
memkraft init --template cursor # .cursorrules + memory/
memkraft init --template mcp # claude_desktop_config snippet + memory/
memkraft init --template rag # retrieval-focused structure
memkraft init --template minimal # just memory/entities/
memkraft templates list # see all presetsTemplates are idempotent β re-running init --template X never overwrites your edits.
Or in Python:
from memkraft import MemKraft
mk = MemKraft("./memory"); mk.init()
mk.track("Simon Kim", entity_type="person", source="news")
mk.update("Simon Kim", info="Launched MemKraft 0.8.1", source="PyPI")
mk.search("MemKraft")That's it. Your agent now has persistent memory as plain markdown files.
No API keys. No database. No config. Just .md files you own.
pip install 'memkraft[mcp]' # + MCP server (`python -m memkraft.mcp`)
pip install 'memkraft[watch]' # + auto-reindex on save (`memkraft watch`)
pip install 'memkraft[all]' # everythingmemkraft agents-hint <target> prints copy-paste-ready integration snippets:
memkraft agents-hint claude-code # β CLAUDE.md / AGENTS.md block
memkraft agents-hint openclaw # β AGENTS.md block for ΠpenClaw
memkraft agents-hint cursor # β .cursorrules block
memkraft agents-hint openai # β Custom GPT / function-calling schema
memkraft agents-hint mcp # β claude_desktop_config.json snippet
memkraft agents-hint langchain # β LangChain StructuredTool wrappersPaste the output. Done. Or pipe it straight into your config:
memkraft agents-hint claude-code >> AGENTS.mdSee examples/ for runnable variants.
| MemKraft | Mem0 | Letta | |
|---|---|---|---|
| Dependencies | 0 | many | many |
| API key required | No | Yes | Yes |
| Source of truth | Plain .md |
Cloud/DB | DB |
| Local-first | β | β | β |
| Git-friendly | β | β | β |
| API | Since | Role |
|---|---|---|
track |
0.5 | Start tracking an entity |
update |
0.5 | Append information to an entity |
search |
0.5 | Hybrid search (exact + IDF + fuzzy + BM25) |
tier_set |
0.8 | Set tier: core / recall / archival |
fact_add |
0.8 | Record a bitemporal fact (fact_type: episodic / semantic / procedural since 2.6) |
log_event |
0.8 | Log a timestamped event |
decision_record |
0.9 | Capture a decision with rationale |
evidence_first |
0.9 | Retrieve evidence before acting |
prompt_register |
1.0 | Register a prompt/skill as an entity |
prompt_eval |
1.0 | Record one tuning iteration |
prompt_evidence |
1.0 | Cite past tuning results |
convergence_check |
1.0 | Auto-judge convergence |
auto_tier |
2.6 | Recommend core / recall / archival from (recency, frequency, importance); dry_run=True by default |
cache_stats |
2.7 | Inspect search cache hit/miss/eviction counters and current generation |
Also new in 2.6: silent contradiction detection on fact_add, plus 1-hop graph neighbor expansion for counting-style queries (how many, list all).
New in 2.7: in-process search result caching for search_v2() and search_smart() β thread-safe LRU + TTL (default capacity 256, TTL 300s). Mutations (update, track, fact_add, log_event, consolidate, decision_record, dream_cycle) auto-invalidate via a generation counter, so callers never need to think about cache coherence. Opt-out per call with cache=False. Measured 6.14x speedup on a hot-path workload (152 β 931 qps) and 1.65x on a 50/50 mixed workload β raw numbers in benchmarks/v2.7.0-bench-result.json. Zero breaking changes.
Self-improvement loop: register β tune β recall β decide, every step auditable and time-travelable. See MIGRATION.md for upgrading from 0.9.x (zero breaking changes).
Register a prompt/skill, record iterations, cite past evidence, and let MemKraft auto-judge when to stop tuning β all in plain Markdown, no LLM calls inside MemKraft:
from memkraft import MemKraft
mk = MemKraft("./memory")
# 1. register a prompt/skill as a first-class entity
mk.prompt_register(
"my-skill",
path="skills/my-skill/SKILL.md",
owner="zeon",
tags=["tuning"],
)
# 2. record each empirical iteration (host agent dispatches the run
# β MemKraft only persists the report)
mk.prompt_eval(
"my-skill",
iteration=1,
scenarios=[{
"name": "parallel-dispatch",
"description": "3 subagents at once",
"requirements": [{"item": "all return", "critical": True}],
}],
results=[{
"scenario": "parallel-dispatch",
"success": True, "accuracy": 85,
"tool_uses": 5, "duration_ms": 2000,
"unclear_points": ["schema missing"],
"discretion": [],
}],
)
# 3. cite past iterations before the next run
mk.prompt_evidence("my-skill", "accuracy regression")
# 4. stop when the last N iterations stabilise
verdict = mk.convergence_check("my-skill", window=2)
# -> {"converged": False, "reason": "insufficient-iters",
# "iterations_checked": [1],
# "suggested_next": "patch-and-iterate", ...}Each call leaves an auditable trail on disk: a decision record per iteration, an incident when unclear points pile up, and wiki-links between iterations. Upgrade is zero-breaking from 0.9.x β see MIGRATION.md.
## π§© Features
| Feature | Description |
|---|---|
| Auto-extract | Pipe any text in, get entities + facts out. Regex-based NER for EN, KR, CN, JP - no LLM calls. |
| CJK detection | 806 stopwords, 100 Chinese surnames, 85 Japanese surnames, Korean particle stripping. |
| Cognify pipeline | Routes inbox/ items to the right directory. Recommend-only by default - --apply to move. |
| Fact registry | Extracts currencies, percentages, dates, quantities into a cross-domain index. |
| Originals capture | Save raw text verbatim - no paraphrasing. |
| Confidence levels | Tag facts as verified / experimental / hypothesis. Dream Cycle warns untagged facts. |
| Applicability conditions | --when "condition" --when-not "condition" - facts get When: / When NOT: metadata. |
| Feature | Description |
|---|---|
| Fuzzy search | difflib.SequenceMatcher-based. Works offline, zero setup. |
| Brain-first lookup | Searches entities β notes β decisions β meetings. Stops after sufficient high-relevance results. |
| Agentic search | Multi-hop: decompose query β search β traverse [[wiki-links]] β re-rank by tier/recency/confidence/applicability. |
| Goal-weighted re-ranking | Conway SMS: same query with different --context produces different rankings. |
| Feedback loop | --file-back: search results auto-filed back to entity timelines (compound interest for memory). |
| Progressive disclosure | 3-level query: L1 index (~50 tokens) β L2 section headers β L3 full file. |
| Backlinks | [[entity-name]] cross-references. See every page that references an entity. |
| Link suggestions | Auto-suggest missing [[wiki-links]] based on known entity names. |
| Feature | Description |
|---|---|
| Compiled Truth + Timeline | Dual-layer entity model: mutable current state + append-only audit trail with [Source:] tags. |
| Memory tiers | Core / Recall / Archival - explicit context window priority. promote to reclassify. |
| Memory type classification | 8 types: identity, belief, preference, relationship, skill, episodic, routine, transient. |
| Type-aware decay | Identity memories decay 10x slower than routine memories. Differential decay multipliers. |
| RESOLVER.md | MECE classification tree - every piece of knowledge has exactly one destination. |
| Source attribution | Every fact tagged with [Source: who, when, how]. Enforced by Dream Cycle. |
| Dialectic synthesis | Auto-detect contradictory facts during extract, tag [CONFLICT], generate CONFLICTS.md. |
| Conflict resolution | `resolve-conflicts --strategy newest |
| Live Notes | Persistent tracking for people and companies. Auto-incrementing updates + timeline. |
| Feature | Description |
|---|---|
| Dream Cycle | Nightly auto-maintenance: missing sources, thin pages, duplicates, inbox age, bloated pages, daily notes. |
| Debug Hypothesis Tracking | OBSERVE β HYPOTHESIZE β EXPERIMENT β CONCLUDE flow. Track hypotheses, evidence, rejections. Auto-switch warning after 2 failures. Search past sessions to avoid repeating failed approaches. |
| Health Check | 5 self-diagnostic assertions: source attribution, orphan facts, duplicates, inbox freshness, unresolved conflicts. Pass rate % + health score (A/B/C/D). |
| Memory decay | Older, unaccessed memories naturally decay - type-aware differential curves. |
| Fact dedup | Detects and merges duplicate facts across entities. |
| Auto-summarize | Condenses bloated pages while preserving key information. |
| Diff tracking | See exactly what changed since the last Dream Cycle. |
| Open loop tracking | Finds all pending / TODO / FIXME items across memory. |
| Feature | Description |
|---|---|
| Session logging | JSONL event trail with tags, importance, entity, task, and decision fields. |
| Daily retrospective | Auto-generated Well / Bad / Next from session events + file changes. |
| Decision distillation | Scans events and notes for decision candidates. EN + KR keyword matching. |
| Meeting briefs | One command compiles entity info, timeline, open threads, and a pre-meeting checklist. |
| Feature | Description |
|---|---|
| β Debug Hypothesis Tracking | OBSERVEβHYPOTHESIZEβEXPERIMENTβCONCLUDE loop with persistent failure memory. |
memkraft init
memkraft extract "Simon Kim is the CEO of Hashed in Seoul." --source "news"
memkraft brief "Simon Kim"
memkraft doctor # π’/π‘/π΄ health check with fix hints
memkraft doctor --fix --yes # auto-repair missing structure (create-only, never deletes)
memkraft stats --export json # workspace stats for CI dashboards
memkraft mcp doctor # validate MCP server readiness
memkraft mcp test # rememberβsearchβrecall smoke testMCP (Claude Desktop / Cursor) setup: see docs/mcp-setup.md.
from memkraft import MemKraft
mk = MemKraft("/path/to/memory")
mk.init() # returns {"created": [...], "exists": [...], "base_dir": "..."}
# Extract entities & facts from text
mk.extract_conversations("Simon Kim is the CEO of Hashed.", source="news")
# Track an entity
mk.track("Simon Kim", entity_type="person", source="news")
mk.update("Simon Kim", info="Launched MemKraft", source="X/@simonkim_nft")
# Search with fuzzy matching
results = mk.search("venture capital", fuzzy=True)
# Agentic multi-hop search with context-aware re-ranking
results = mk.agentic_search(
"who is the CEO of Hashed",
context="crypto investment research", # Conway SMS: same query, different context β different ranking
file_back=True, # feedback loop: results auto-filed back to entity timelines
)
# Run health check (5 self-diagnostic assertions)
report = mk.health_check()
# β {"pass_rate": 80.0, "health_score": "A", ...}
# Dream Cycle - nightly maintenance
mk.dream(dry_run=True)More CLI examples - 6 daily patterns that cover 90% of use
# 1. Extract & Track - auto-detect entities from any text
memkraft extract "Simon Kim is the CEO of Hashed in Seoul." --source "news"
memkraft extract "Revenue grew 85% YoY" --confidence verified --when "bull market"
memkraft track "Simon Kim" --type person --source "X/@simonkim_nft"
memkraft update "Simon Kim" --info "Launched MemKraft" --source "X/@simonkim_nft"
# 2. Search & Recall - find anything in your memory
memkraft search "venture capital" --fuzzy
memkraft search "Seoul VC" --file-back # feedback loop: auto-file to timelines
memkraft lookup "Simon" --brain-first
memkraft agentic-search "who is the CEO of Hashed" --context "meeting prep"
# 3. Meeting Prep - compile all context before a meeting
memkraft brief "Simon Kim"
memkraft brief "Simon Kim" --file-back # record brief generation in timeline
memkraft links "Simon Kim"
# 4. Ingest & Classify - inbox β structured pages (safe by default)
memkraft cognify # recommend-only; add --apply to move files
memkraft detect "Jack Ma and 马εθ
Ύ discussed AI" --dry-run
# 5. Log & Reflect - structured audit trail
memkraft log --event "Deployed v0.3" --tags deploy --importance high
memkraft retro # daily Well / Bad / Next retrospective
# 6. Maintain & Heal - Dream Cycle keeps memory healthy
memkraft health-check # 5 assertions β pass rate + health score (A/B/C/D)
memkraft dream --dry-run # nightly: sources, duplicates, bloated pages
memkraft resolve-conflicts --strategy confidence # resolve contradictory facts
memkraft diff # what changed since last maintenance?
memkraft open-loops # find all unresolved items
# 7. Debug Hypothesis Tracking - "Debugging is Memory"
memkraft debug start "API returns 500 on POST /users"
memkraft debug hypothesis "Database connection timeout"
memkraft debug evidence "DB pool healthy" --result contradicts
memkraft debug reject --reason "DB is fine"
memkraft debug hypothesis "Request validation missing"
memkraft debug evidence "Empty POST triggers 500" --result supports
memkraft debug confirm
memkraft debug end "Added request body validation"
memkraft debug search-rejected "timeout" # avoid past mistakes## π¬ Specialized Features
This section gathers features that deserve a closer look: time-travel snapshots, multi-agent context plumbing, autonomous memory lifecycle, and scientific debugging.
| Feature | Description |
|---|---|
| Snapshot | Create a point-in-time manifest of all memory files (hash, size, summary, sections, fact count, link count). Optionally embed full content. |
| Snapshot List | List all saved snapshots, newest first, with labels and metadata. |
| Snapshot Diff | Compare two snapshots (or snapshot vs live state). Shows added, removed, modified, unchanged files with byte deltas. |
| Time Travel | Search memory as it was at a past snapshot. Answer "what did I know about X on March 1st?" |
| Entity Timeline | Track how a specific entity evolved across all snapshots β new, modified, unchanged, deleted states. |
from memkraft import MemKraft
mk = MemKraft("/path/to/memory")
# Take a snapshot before a big operation
snap = mk.snapshot(label="before-migration", include_content=True)
# ... time passes, memory changes ...
# What changed?
diff = mk.snapshot_diff(snap["snapshot_id"]) # vs live state
# β {added: [...], removed: [...], modified: [...], unchanged_count: 42}
# Search memory as it was at that snapshot
results = mk.time_travel("venture capital", snapshot_id=snap["snapshot_id"])
# How did an entity evolve over time?
timeline = mk.snapshot_entity("Simon Kim")
# β [{snapshot_id, timestamp, fact_count, size, change_type: "new"}, ...]| Feature | Description |
|---|---|
| Channel Context Memory | Per-channel context persistence. Save/load/update context keyed by channel ID (e.g. telegram-46291309). Stored in .memkraft/channels/{channel_id}.json. |
| Task Continuity Register | Task lifecycle tracking with full history. task_start β task_update β task_complete + task_history + task_list. Each update stores timestamp + status + note. Stored in .memkraft/tasks/{task_id}.json. |
| Agent Working Memory | Per-agent persistent context. agent_save / agent_load any working memory dict. Stored in .memkraft/agents/{agent_id}.json. |
agent_inject() |
The key feature. Merges agent working memory + channel context + task history into a single ready-to-inject prompt block. Use this to give sub-agents full situational awareness. |
from memkraft import MemKraft
mk = MemKraft("/path/to/memory")
# Save channel context
mk.channel_save("telegram-46291309", {
"summary": "DM with Simon",
"recent_tasks": ["vibekai deploy", "memkraft v0.5.4"],
"preferences": {"language": "ko"},
})
# Register a task
mk.task_start("deploy-001", "Deploy vibekai to production",
channel_id="telegram-46291309", agent="zeon")
mk.task_update("deploy-001", "active", "vercel build passed")
# Save agent working memory
mk.agent_save("zeon", {
"key_context": "Simon's AI assistant",
"active_tasks": ["deploy-001"],
"learned": ["always report completion", "no silence"],
})
# Inject merged context block into a sub-agent instruction
block = mk.agent_inject("zeon",
channel_id="telegram-46291309",
task_id="deploy-001")
print(block)
# ## Agent Working Memory
# - **key_context:** Simon's AI assistant
# - **active_tasks:** deploy-001
# ...
# ## Channel Context
# - **summary:** DM with Simon
# ...
# ## Task Context
# - **Task:** Deploy vibekai to production
# - **Status:** active
# - **History:**
# - [2026-04-15T...] active: vercel build passed"Memory should manage itself."
Memory tends to grow without limit β agents add entries but rarely clean up. MemKraft 1.1.0 solves this with a self-managing lifecycle.
- Add-only pattern: agents append to MEMORY.md every session, never prune
- Silent maintenance failures: nightly cleanup crons fail without notice
- No lifecycle: every memory entry treated equally, forever
from memkraft import MemKraft
mk = MemKraft(base_dir="memory/")
# 1. Import existing MEMORY.md β structured MemKraft data
mk.flush("MEMORY.md")
# 2. Auto-archive old/low-priority items
result = mk.compact(max_chars=15000)
# β {"moved": 47, "freed_chars": 89400, ...}
# 3. Re-render MEMORY.md β always β€ 15KB
mk.digest("MEMORY.md")
# β {"chars": 11700, "truncated": False}
# 4. Check memory health
health = mk.health()
# β {"status": "healthy", "total_chars": 11700, "recommendations": [...]}Our MEMORY.md grew to 153KB (1,862 lines) over weeks of agent sessions.
After flush β compact β digest: 11.7KB (170 lines). 92% reduction.
# Watch for real-time sync
mk.watch("memory/", on_change="flush", interval=300)
# Or set a nightly schedule (requires: pip install memkraft[schedule])
mk.schedule([
lambda: mk.compact(max_chars=15000),
lambda: mk.digest("MEMORY.md"),
], cron_expr="0 23 * * *")Debugging insights are too valuable to lose in scrollback. MemKraft treats the entire debug process as first-class memory.
The debug-hypothesis loop - inspired by Shen Huang's scientific debugging method:
OBSERVE β HYPOTHESIZE β EXPERIMENT β CONCLUDE
β |
| rejected? |
+βββ next hypothesis ββββ+
|
all rejected? β back to OBSERVE
mk.start_debug("bug description")- begin a tracked sessionmk.log_hypothesis(bug_id, "theory", "evidence")- record each theorymk.log_evidence(bug_id, hyp_id, "test result", "supports|contradicts")- track proofmk.reject_hypothesis(bug_id, hyp_id, "reason")- mark failed approachesmk.confirm_hypothesis(bug_id, hyp_id)- lock in the root causemk.end_debug(bug_id, "resolution")- close session, feed back to memory
Why it matters: rejected hypotheses are permanent memory. Next time you hit a similar bug, MemKraft surfaces what you already tried - no more repeating the same failed approaches.
## π API Reference
Initialize the memory system. If base_dir is not provided, uses $MEMKRAFT_DIR or ./memory.
from memkraft import MemKraft
mk = MemKraft("/path/to/memory")| Method | Description |
|---|---|
init(path="") |
Create memory directory structure with all subdirectories and templates. |
track(name, entity_type="person", source="") |
Start tracking an entity. Creates a live-note in live-notes/. |
update(name, info, source="manual") |
Append new information to a tracked entity's timeline. |
brief(name, save=False, file_back=False) |
Generate a meeting brief for an entity. file_back=True records the brief generation in the entity timeline. |
promote(name, tier="core") |
Change memory tier: core / recall / archival. |
list_entities() |
List all tracked entities with their types. |
| Method | Description |
|---|---|
extract_conversations(input_text, source="", dry_run=False, confidence="experimental", applicability="") |
Extract entities and facts from text. confidence: verified / experimental / hypothesis. applicability: "When: X | When NOT: Y". |
detect(text, source="", dry_run=False) |
Detect entities in text (EN/KR/CN/JP). |
cognify(dry_run=False, apply=False) |
Route inbox items to structured directories. Recommend-only by default. |
extract_facts_registry(text="") |
Extract numeric/date facts into cross-domain index. |
detect_conflicts(entity_name, new_fact, source="") |
Check for contradictory facts and tag with [CONFLICT]. |
resolve_conflicts(strategy="newest", dry_run=False) |
Resolve conflicts. Strategies: newest, confidence, keep-both, prompt. |
classify_memory_type(text) |
Classify text into one of 8 memory types. |
| Method | Description |
|---|---|
search(query, fuzzy=False) |
Search memory files. Returns list of {file, score, context, line}. |
agentic_search(query, max_hops=2, json_output=False, context="", file_back=False) |
Multi-hop search with query decomposition, link traversal, and goal-weighted re-ranking. context enables Conway SMS reconstructive ranking. file_back enables the feedback loop. |
lookup(query, json_output=False, brain_first=False, full=False) |
Brain-first lookup: stop early on high-relevance hits unless full=True. |
query(query="", level=1, recent=0, tag="", date="") |
Progressive disclosure: L1=index, L2=sections, L3=full. |
links(name) |
Show backlinks to an entity ([[wiki-links]]). |
| Method | Description |
|---|---|
dream(date=None, dry_run=False, resolve_conflicts=False) |
Run Dream Cycle. 6 health checks + optional conflict resolution. |
health_check() |
Run 5 self-diagnostic assertions. Returns {pass_rate, health_score, assertions}. |
decay(days=90, dry_run=False) |
Flag stale facts. Type-aware: identity decays 10x slower than routine. |
dedup(dry_run=False) |
Find and merge duplicate facts. |
summarize(name=None, max_length=500) |
Auto-summarize bloated entity pages. |
diff() |
Show changes since last Dream Cycle. |
open_loops(dry_run=False) |
Find unresolved items (TODO/FIXME/pending). |
build_index() |
Build memory index at .memkraft/index.json. |
suggest_links() |
Suggest missing [[wiki-links]]. |
| Method | Description |
|---|---|
log_event(event, tags="", importance="normal", entity="", task="", decision="") |
Log a session event to JSONL. |
log_read(date=None) |
Read session events for a date. |
retro(dry_run=False) |
Generate daily retrospective (Well / Bad / Next). |
distill_decisions() |
Scan for decision candidates in events and notes. |
| Method | Description |
|---|---|
start_debug(bug_description) |
Start a new debug session. Returns {bug_id, file, status}. |
log_hypothesis(bug_id, hypothesis, evidence="", status="testing") |
Log a hypothesis. Auto-increments ID (H1, H2, ...). |
get_hypotheses(bug_id) |
Get all hypotheses for a debug session. |
reject_hypothesis(bug_id, hypothesis_id, reason="") |
Reject a hypothesis. Preserved permanently for future reference. |
confirm_hypothesis(bug_id, hypothesis_id) |
Confirm a hypothesis. Feeds back into memory. |
log_evidence(bug_id, hypothesis_id, evidence_text, result="neutral") |
Log evidence. Result: supports / contradicts / neutral. |
get_evidence(bug_id, hypothesis_id="") |
Get evidence entries, optionally filtered by hypothesis. |
end_debug(bug_id, resolution) |
End session with resolution. Auto-feeds to memory. |
get_debug_status(bug_id) |
Get current session status and hypothesis counts. |
debug_history(limit=10) |
List past debug sessions. |
search_debug_sessions(query) |
Search past sessions by description/hypothesis/resolution. |
search_rejected_hypotheses(query) |
Search rejected hypotheses β anti-pattern detector. |
| Method | Description |
|---|---|
snapshot(label="", include_content=False) |
Create a point-in-time snapshot of all memory files. Returns {snapshot_id, timestamp, label, file_count, total_bytes, path}. |
snapshot_list() |
List all saved snapshots, newest first. |
snapshot_diff(snapshot_a, snapshot_b="") |
Compare two snapshots, or a snapshot vs live state. Returns {added, removed, modified, unchanged_count}. |
time_travel(query, snapshot_id="", date="") |
Search memory as it was at a past snapshot. Supports search by snapshot ID or date. |
snapshot_entity(name) |
Track how a specific entity evolved across all snapshots (new/modified/unchanged/deleted). |
memkraft <command> [options]
| Command | Description |
|---|---|
init [--path DIR] |
Initialize memory structure |
extract TEXT [--source S] [--dry-run] [--confidence C] [--when W] [--when-not W] |
Auto-extract entities and facts |
detect TEXT [--source S] [--dry-run] |
Detect entities in text (EN/KR/CN/JP) |
track NAME [--type T] [--source S] |
Start tracking an entity |
update NAME --info INFO [--source S] |
Update a tracked entity |
list |
List all tracked entities |
brief NAME [--save] [--file-back] |
Generate meeting brief |
promote NAME [--tier T] |
Change memory tier (core/recall/archival) |
search QUERY [--fuzzy] [--file-back] |
Search memory files |
agentic-search QUERY [--max-hops N] [--json] [--context C] [--file-back] |
Multi-hop agentic search |
lookup QUERY [--json] [--brain-first] [--full] |
Brain-first lookup |
query [QUERY] [--level 1|2|3] [--recent N] [--tag T] [--date D] |
Progressive disclosure query |
links NAME |
Show backlinks to an entity |
cognify [--dry-run] [--apply] |
Process inbox into structured pages |
log --event E [--tags T] [--importance I] [--entity E] [--task T] [--decision D] |
Log session event |
log --read [--date D] |
Read session events |
retro [--dry-run] |
Daily retrospective |
distill-decisions |
Scan for decision candidates |
health-check |
Run 5 self-diagnostic assertions β health score |
dream [--date D] [--dry-run] [--resolve-conflicts] |
Run Dream Cycle (nightly maintenance) |
resolve-conflicts [--strategy S] [--dry-run] |
Resolve fact conflicts |
decay [--days N] [--dry-run] |
Flag stale facts |
dedup [--dry-run] |
Find and merge duplicates |
summarize [NAME] [--max-length N] |
Auto-summarize bloated pages |
diff |
Show changes since last Dream Cycle |
open-loops [--dry-run] |
Find unresolved items |
index |
Build memory index |
suggest-links |
Suggest missing wiki-links |
extract-facts [TEXT] |
Extract numeric/date facts |
debug start DESC |
Start a new debug session (OBSERVE) |
debug hypothesis TEXT [--bug-id ID] [--evidence E] |
Log a hypothesis (HYPOTHESIZE) |
debug evidence TEXT [--bug-id ID] [--hypothesis-id H] [--result R] |
Log evidence (supports/contradicts/neutral) |
debug reject [--bug-id ID] [--hypothesis-id H] [--reason R] |
Reject current hypothesis |
debug confirm [--bug-id ID] [--hypothesis-id H] |
Confirm current hypothesis |
debug status [--bug-id ID] |
Show debug session status |
debug history [--limit N] |
List past debug sessions |
debug end RESOLUTION [--bug-id ID] |
End debug session (CONCLUDE) |
debug search QUERY |
Search past debug sessions |
debug search-rejected QUERY |
Search rejected hypotheses (anti-patterns) |
snapshot [--label L] [--include-content] |
Create a point-in-time memory snapshot |
snapshot-list |
List all saved snapshots (newest first) |
snapshot-diff SNAP_A [SNAP_B] |
Compare two snapshots or snapshot vs live state |
time-travel QUERY [--snapshot ID] [--date YYYY-MM-DD] |
Search memory as it was at a past snapshot |
snapshot-entity NAME |
Show how an entity evolved across snapshots |
selfupdate [--dry-run] |
Self-upgrade MemKraft via pip when newer version on PyPI |
doctor [--check-updates] |
Health check; with --check-updates also reports PyPI version status |
## ποΈ Architecture
Raw Input βββΆ Extract βββΆ Classify βββΆ Forge βββΆ Compound Knowledge
β² β β
β Confidence β
β Applicability β
β β
βββββ Feedback Loop βββ Brain-first recall βββββββββ
maintained by Dream Cycle + Health Check
Zero dependencies. Built entirely from Python stdlib: re for NER, difflib for fuzzy search, json for structured data, pathlib for file ops. No vector DB, no LLM calls at runtime, no framework lock-in.
Compiled Truth + Timeline. Every entity has two layers: a mutable Compiled Truth (current state) and an append-only Timeline with [Source:] tags. You get both "what we know now" and "how we got here."
Auto-Extract pipeline. Multi-stage NER: English Title Case β Korean particle stripping β Chinese surname detection (100 surnames) β Japanese surname detection (85 surnames) β fact extraction (X is/was/leads Y) β stopword filtering (806 KR/CN/JP stopwords).
Goal-weighted re-ranking (Conway SMS). agentic_search("X", context="meeting prep") and agentic_search("X", context="investment analysis") return different rankings from the same data. Memory type, confidence, and applicability conditions all factor into scoring.
Feedback loop. --file-back files search results back into entity timelines. Each query makes future queries richer - compound interest for memory.
Health Check. 5 assertions: (1) source attribution, (2) no orphan facts, (3) no duplicates, (4) inbox freshness, (5) no unresolved conflicts. Returns a pass rate and letter grade (A/B/C/D).
memory/
βββ .memkraft/ # Internal state (index.json, timestamps)
βββ sessions/ # Structured event logs (YYYY-MM-DD.jsonl)
βββ RESOLVER.md # Classification decision tree (MECE)
βββ TEMPLATES.md # Page templates with tier labels
βββ CONFLICTS.md # Auto-generated conflict report
βββ open-loops.md # Unresolved items hub
βββ fact-registry.md # Cross-domain numeric/date facts
βββ YYYY-MM-DD.md # Daily notes
βββ entities/ # People, companies, concepts (Tier: recall)
βββ live-notes/ # Persistent tracking targets (Tier: core)
βββ decisions/ # Decision records with rationale
βββ originals/ # Captured verbatim - no paraphrasing
βββ inbox/ # Quick capture before classification
βββ tasks/ # Work-in-progress context
βββ meetings/ # Briefs and notes
βββ debug/ # Debug sessions (DEBUG-YYYYMMDD-HHMMSS.md)
| MemKraft | Mem0 | Letta | |
|---|---|---|---|
| Storage | Plain Markdown | Vector + Graph DB | DB-backed |
| Dependencies | Zero | Vector DB + API | DB + runtime |
| Offline / git-friendly | β | β | β |
| Auto-extract (EN/KR/CN/JP) | β | β (LLM) | - |
| Agentic search | β | - | - |
| Goal-weighted re-ranking | β | - | - |
| Feedback loop | β | - | - |
| Confidence levels | β | - | - |
| Health check | β | - | - |
| Conflict detection & resolution | β | - | - |
| Source attribution | Required | - | - |
| Dream Cycle | β | - | - |
| Memory tiers | β | - | β |
| Type-aware decay | β | - | - |
| Debug hypothesis tracking | β | - | - |
| Memory snapshots & time travel | β | β | β |
| Entity evolution timeline | β | β | β |
| Snapshot diff | β | β | β |
| Semantic search | β | β | - |
| Graph memory | β | β | - |
| Self-editing memory | β | - | β |
| Cost | Free | Free tier + paid | Free |
Choose MemKraft when: you want portable, git-friendly, zero-dependency memory that works with any agent framework, offline, forever.
Choose something else when: you need semantic/vector search, graph traversal, or a full agent runtime with virtual context management.
MemKraft achieves 98.0% on LongMemEval (LLM-as-judge, oracle subset, 3-run semantic majority vote). Single-run performance: 96β98% (non-deterministic at inference level β sampling, not memory).
Score measured on v1.0.2; v2.6.0 is regression-free with 1168 tests passing and is API-compatible with the benchmark harness.
Comparison vs prior SOTA:
- MemKraft (v1.0.2 measurement) β 98.0% (LLM-judge, oracle 50, 3-run majority)
- MemPalace β 96.6%
- MEMENTO/MS β 90.8%
git clone https://github.com/seojoonkim/memkraft
cd memkraft
pip install -e ".[bench]"cd benchmarks/longmemeval
# Single run (96% typical)
MODEL="claude-sonnet-4-6" \
ANTHROPIC_API_KEY="your-key" \
TAG="myrun" \
python3 run.py 50 oracle
# LLM-as-judge scoring
MODEL="claude-sonnet-4-6" \
ANTHROPIC_API_KEY="your-key" \
python3 llm_judge.py
# 3-run majority vote (98% typical)
MODEL="claude-sonnet-4-6" \
ANTHROPIC_API_KEY="your-key" \
python3 run_majority_vote.py- Dataset: LongMemEval oracle subset (50 questions)
- Judge: LLM-as-judge (claude-sonnet-4-6) β semantic matching, not string match
- 98% = 3-run semantic majority vote result
- Single run: 96~100% depending on inference sampling
- Reproducibility note: Variance comes from LLM inference sampling, not from MemKraft itself. Memory storage and retrieval are deterministic.
MemKraft ships an opt-in self-upgrade flow so agents (and humans) never silently drift behind PyPI:
memkraft doctor --check-updates # π’ up to date / π‘ update available / π΄ PyPI unreachable
memkraft selfupdate # pip install -U memkraft when newer
memkraft selfupdate --dry-run # check onlyClassic still works:
pip install -U memkraftFor agents: add memkraft doctor --check-updates to your weekly skill or heartbeat β if it reports π‘, ask the human before running memkraft selfupdate. Never auto-upgrade without explicit consent.
For maintainers: pushing a vX.Y.Z git tag triggers .github/workflows/release.yml, which builds, verifies (twine check), publishes to PyPI, and cuts a GitHub Release. Requires a PYPI_API_TOKEN repo secret β add it at Settings β Secrets and variables β Actions.
## π Changelog
Highlights from recent releases. Full history: CHANGELOG.md.
auto_tierβ recommendcore/recall/archivalfrom(recency, frequency, importance);dry_run=Trueby default.fact_typeonfact_addβepisodic/semantic/proceduraltaxonomy with silent contradiction detection.- 1-hop graph neighbor expansion for counting-style queries (
how many,list all). - 1168 tests passing; zero breaking changes from 2.5.x.
- Hybrid search upgraded to exact + IDF + fuzzy + BM25.
- Smarter ranking on multi-token queries; better recall for short answers.
- Cross-entity link graph hardened; faster
link_scan, more reliable backlinks index. - Performance improvements on large memory directories.
- Watchdog-based
memkraft watchstability fixes. - Doctor health hints; richer
--check-updatesoutput.
- Major API consolidation around the register β tune β recall β decide loop.
- Bitemporal facts, tier labels, reversible decay, link graph become first-class.
- Zero breaking changes from 0.9.x β see MIGRATION.md.
flush β compact β digest self-managing lifecycle. See π€ Autonomous Memory Management above for details.
prompt_register / prompt_eval / prompt_evidence / convergence_check make tuning a first-class, auditable workflow.
Earlier releases (v0.x β one-line summaries)
- v0.8.1 (2026-04-17) β
agents-hintCLI,examples/,python -m memkraft.mcp,memkraft watch,memkraft doctor. 515 tests. - v0.8.0 (2026-04-17) β Bitemporal Fact Layer + Memory Tier Labels + Reversible Decay/Tombstone + Cross-Entity Link Graph. 492 tests.
- v0.7.0 (2026-04-15) β multi-agent:
channel_updatemodes, task delegation,agent_handoff, channel task listing, task cleanup. 409 tests. - v0.5.4 (2026-04-15) β Channel Context Memory + Task Continuity Register + Agent Working Memory +
agent_inject(). 377 tests. - v0.5.1 (2026-04-14) β Memory Snapshots & Time Travel:
snapshot/snapshot_list/snapshot_diff/time_travel/snapshot_entity. 328 tests. - v0.4.1 (2026-04-13) β README: Debugging is Memory section + Appendix (Inspirations & Credits).
- v0.4.0 (2026-04-13) β Debug Hypothesis Tracking: full OBSERVEβHYPOTHESIZEβEXPERIMENTβCONCLUDE loop, 2-fail auto-switch warning,
search_rejected_hypotheses(). 277 tests. - v0.3.0 (2026-04-13) β Query-to-Memory Feedback Loop (
--file-back), Confidence Levels, Memory Health Assertions, Applicability Conditions. 198 tests. - v0.2.0 (2026-04-12) β Goal-Weighted Reconstructive Memory (Conway SMS), Dialectic Synthesis, Memory Type Classification (8 types), Type-Aware Decay. 158 tests.
- v0.1.0 (2026-04-12) β Initial release: extract, detect, decay, dedup, summarize, agentic search, entity tracking, Dream Cycle, hybrid search. Zero dependencies.
Full details for every release: CHANGELOG.md.
PRs welcome. See CONTRIBUTING.md.
MIT - use it however you want.
MemKraft stands on the shoulders of giants. These projects and ideas shaped our approach:
| Project | Inspiration | Link |
|---|---|---|
| Karpathy auto-research | Evidence-based autonomous research methodology | Tweet |
| Shen Huang debug-hypothesis | Scientific debugging: hypothesis-driven, max 5-line experiments | GitHub Β· Tweet |
| Letta (MemGPT) | Tiered memory architecture (core / archival / recall) | GitHub |
| mem0 | Agent memory extraction and retrieval patterns | GitHub |
| Zep | Temporal memory decay and entity extraction | GitHub |
| MemoryWeaver | Dialectic synthesis and memory reconstruction | GitHub |
| Shubham Saboo's 6-agent system | OpenClaw-based multi-agent + SOUL.md / MEMORY.md pattern | Article |
| Karpathy llm-wiki | Wiki-style structured knowledge for LLMs | Tweet |
"If I have seen further, it is by standing on the shoulders of giants."
Thank you to all these creators for sharing their work openly. MemKraft exists because of you.