GitHub - echonoshy/replica: Memory layer for AI.

Memory layer for AI.
Give your AI the ability to remember.

What is Replica?

Your AI has the memory of a goldfish. Every conversation? Fresh start. That thing you mentioned yesterday? Gone. Your preferences? Vanished into the void.

Replica fixes this. It's a memory layer that gives AI the ability to actually remember things. Not just for 5 minutes. Not just within a single chat. But across conversations, sessions, and time.

Think of it as RAM for your AI's brain. Except it doesn't forget when you close the tab.

💬 "Remember when I told you about my trip to Tokyo last month?"

→ Replica searches 10,000+ memories
→ Finds: "User visited Tokyo in March 2026"
→ Returns relevant context in 50ms

The Problem

AI amnesia is real. Without memory, your AI is like that friend who asks "wait, what were we talking about?" every 30 seconds.

Every conversation starts from scratch
Context windows are expensive (and finite)
RAG alone doesn't cut it - you need structured memory, not just keyword matching
Facts, events, plans, preferences... they all need different handling

Replica solves this. Automatically. No prompt engineering gymnastics required.

✨ Features

🎯 Smart Memory Extraction

Replica doesn't just store text. It understands conversations and extracts structured memories:

Episodes - "User discussed Python async programming best practices"
Events - "User has a meeting tomorrow at 3 PM"
Foresights - "User plans to learn Rust next week"
User Profiles - Interests, skills, preferences, goals

🔎 Hybrid Search That Actually Works

Simple vector search? That's so 2023. Replica uses:

Vector Search - Semantic similarity via pgvector
Full-Text Search - PostgreSQL's battle-tested text search
RRF Fusion - Reciprocal Rank Fusion (fancy way of saying "best of both")
Temporal Decay - Recent stuff matters more (just like real memory)
MMR Reranking - Diverse results, not 10 variations of the same thing

🗜️ Automatic Context Compression

Long conversations? No problem. Replica automatically:

Tracks token counts in real-time
Compresses old messages when hitting limits
Keeps recent context fresh and relevant
Extracts important info before compression

🎨 Beautiful Web UI

Chat with your AI and watch memories being created in real-time:

Real-time streaming chat with memory context

Database explorer for debugging and inspection

🚀 Quick Start

Prerequisites

Component	Requirement
Python	≥ 3.13
PostgreSQL	17 + pgvector
Package Manager	uv
Node Runtime	Bun (recommended) or Node.js
LLM / Embedding	vLLM or any OpenAI-compatible API

1. Start Database

docker run -d --name pgvector \
  -e POSTGRES_PASSWORD=password \
  -p 5432:5432 \
  pgvector/pgvector:pg17

docker exec -it pgvector psql -U postgres -c "CREATE DATABASE replica;"
docker exec -it pgvector psql -U postgres -d replica -c "CREATE EXTENSION IF NOT EXISTS vector;"

2. Install & Migrate

uv sync
uv run alembic upgrade head

3. Configure

Edit config/settings.yaml with your model endpoints:

llm:
  provider: "vllm"
  base_url: "http://localhost:19000/v1"
  model: "Qwen3.5-122B-A10B-FP8"

embedding:
  provider: "vllm"
  base_url: "http://localhost:19001/v1"
  model: "Qwen3-Embedding-4B"
  dimensions: 2560

💡 Full config reference: config/settings.yaml | Detailed guide: docs/guide.md

4. Launch

Backend API (port 8790):

uv run uvicorn replica.main:app --host 0.0.0.0 --port 8790 --reload

Frontend UI (port 8780):

cd web
bun install
bun run dev

Then visit:

URL	Description
`http://localhost:8780`	🎨 Web UI
`http://localhost:8790/docs`	📚 Swagger API Docs
`http://localhost:8790/health`	❤️ Health Check

🎮 How It Works

Memory Lifecycle

1. User chats with AI
   ↓
2. Replica stores messages
   ↓
3. When conversation reaches a natural boundary...
   ↓
4. Extract structured memories:
   • Episodes (what happened)
   • Events (specific facts)
   • Foresights (future plans)
   • User profile updates
   ↓
5. Generate embeddings
   ↓
6. Store in knowledge base
   ↓
7. Next time user asks something...
   ↓
8. Hybrid search retrieves relevant memories
   ↓
9. Inject into AI context
   ↓
10. AI responds with full memory context

Memory Types Explained

Type	What It Stores	Example
Episode	Conversation summaries	"User asked about async/await patterns in Python and discussed event loops"
Event	Concrete facts	"User's birthday is March 15"
Foresight	Future intentions	"User wants to build a web scraper next month"
Evergreen	Long-term facts	"User is a software engineer living in Shanghai"

💻 API Examples

Create a User & Session

import httpx

async with httpx.AsyncClient() as client:
    # Create user
    user = await client.post(
        "http://localhost:8790/v1/users",
        json={"external_id": "alice", "name": "Alice"}
    )
    user_id = user.json()["id"]
    
    # Create session
    session = await client.post(
        f"http://localhost:8790/v1/users/{user_id}/sessions",
        json={}
    )
    session_id = session.json()["id"]

Stream Chat with Memory

# Stream chat (Server-Sent Events)
async with client.stream(
    "POST",
    f"http://localhost:8790/v1/sessions/{session_id}/chat",
    json={"content": "What did I tell you about my trip?", "use_memory": True}
) as response:
    async for line in response.aiter_lines():
        if line.startswith("data: "):
            data = json.loads(line[6:])
            if "token" in data:
                print(data["token"], end="", flush=True)
            elif "context" in data:
                print("\n\n📚 Retrieved memories:", data["context"])

Extract Memories from Raw Data

# Batch memory extraction
response = await client.post(
    "http://localhost:8790/v1/memories",
    json={
        "new_raw_data_list": [
            {"role": "user", "content": "I'm planning a trip to Tokyo next month"},
            {"role": "assistant", "content": "That sounds exciting! Have you been before?"},
            {"role": "user", "content": "No, first time. I want to visit Shibuya and try real ramen."}
        ],
        "user_id_list": ["alice"]
    }
)
print(f"Extracted {response.json()['memory_count']} memories")

Search Knowledge Base

# Semantic search
results = await client.post(
    "http://localhost:8790/v1/knowledge/search",
    json={
        "user_id": user_id,
        "query": "travel plans",
        "top_k": 5
    }
)

for memory in results.json():
    print(f"[{memory['entry_type']}] {memory['content']} (score: {memory['score']:.2f})")

🏗️ Architecture

Frontend → React 19 web interface (:8780)

Backend → FastAPI server (:8790)

User/Session/Message APIs
Memory extraction & knowledge search
Context compression & embedding generation

LLM Services

Main LLM (:19000) - Chat completion & memory extraction
Embedding model (:19001) - Vector generation

Storage → PostgreSQL 17 + pgvector (:5432)

🛠️ Development

Backend (Python):

# Format code
uv run ruff format

# Lint & fix
uv run ruff check --fix

# Run tests
uv run pytest

# Run tests with coverage
uv run pytest --cov=replica

Frontend (TypeScript/React):

cd web

# One-command check & fix (lint + format + import sorting)
bun run check

# Lint only
bun run lint

# Format only
bun run format

📚 Documentation

Complete Guide - Configuration, concepts, and usage
API Reference - Full API documentation
Swagger UI - Interactive API explorer

📄 License

MIT License - see LICENSE for details.

Built by developers tired of explaining the same thing to AI twice

⭐ Star on GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github		.github
alembic		alembic
assets		assets
config		config
docs		docs
src/replica		src/replica
tests		tests
web		web
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
README_zh.md		README_zh.md
alembic.ini		alembic.ini
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Replica?

The Problem

✨ Features

🎯 Smart Memory Extraction

🔎 Hybrid Search That Actually Works

🗜️ Automatic Context Compression

🎨 Beautiful Web UI

🚀 Quick Start

Prerequisites

1. Start Database

2. Install & Migrate

3. Configure

4. Launch

🎮 How It Works

Memory Lifecycle

Memory Types Explained

💻 API Examples

Create a User & Session

Stream Chat with Memory

Extract Memories from Raw Data

Search Knowledge Base

🏗️ Architecture

🛠️ Development

📚 Documentation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is Replica?

The Problem

✨ Features

🎯 Smart Memory Extraction

🔎 Hybrid Search That Actually Works

🗜️ Automatic Context Compression

🎨 Beautiful Web UI

🚀 Quick Start

Prerequisites

1. Start Database

2. Install & Migrate

3. Configure

4. Launch

🎮 How It Works

Memory Lifecycle

Memory Types Explained

💻 API Examples

Create a User & Session

Stream Chat with Memory

Extract Memories from Raw Data

Search Knowledge Base

🏗️ Architecture

🛠️ Development

📚 Documentation

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages