StorySphere

Intelligent Novel Analysis System — Agent-Driven Architecture 智能小說分析系統，以 Agent 驅動架構自動解析、理解並探索小說內容。

Overview / 概覽

StorySphere ingests novels (PDF / DOCX), runs a multi-stage ETL pipeline to extract entities, relations, events, and keywords, then exposes the results through a streaming REST + WebSocket API and a React frontend. An LLM-powered chat agent lets readers have natural-language conversations with any book.

主要能力：

自動解析 PDF / DOCX 小說，偵測章節、切分段落
知識圖譜 — 自動抽取角色、地點、物品及其關係
向量語義搜尋 — 段落級 embedding (Qdrant)
深度分析 — 角色 CEP、原型分類、成長弧線；事件因果分析
對話探索 — LangGraph ReAct Chat Agent，支援串流回覆
視覺化 — 知識圖譜、事件時間軸、分析面板

Tech Stack / 技術棧

Layer	Technology
LLM Orchestration	LangChain · LangGraph · Gemini 2.0 Flash (primary) · GPT-4o-mini · Claude Haiku · Local LLM (Ollama / llama.cpp)
Backend API	FastAPI · Uvicorn · WebSocket
Knowledge Graph	NetworkX (default) · Neo4j (optional, large-scale)
Vector DB	Qdrant
Embeddings	sentence-transformers `all-MiniLM-L6-v2`
Storage	SQLite (aiosqlite · SQLAlchemy)
Keyword Extraction	YAKE · TF-IDF · LLM · Composite
Frontend	React 18 · TypeScript · Vite · React Router
Package Manager (Python)	uv

Architecture / 架構

┌─────────────────────────────────────────────────┐
│                  React Frontend                 │
│  Library · Reader · Graph · Timeline · Analysis │
└────────────────────┬────────────────────────────┘
                     │  HTTP / WebSocket
┌────────────────────▼────────────────────────────┐
│              FastAPI  (src/api/)                │
│  /api/v1/books  entities  relations  analysis   │
│  WS /ws/chat  /ws/chat-deep  /ws/tasks/{id}     │
└──┬─────────────────┬─────────────────┬──────────┘
   │                 │                 │
   ▼                 ▼                 ▼
Chat Agent     Analysis Agent    Ingestion Workflow
(LangGraph     (cache-first,     (ETL Pipelines)
 ReAct)         async, SQLite)
   │                 │                 │
   └────────┬────────┘        ┌────────┘
            ▼                 ▼
         Tools (18)        Services
     graph / retrieval /   KG · Document
     analysis / composite  Vector · Summary
                           Extraction · Analysis
            │
   ┌────────┴────────┐
   ▼                 ▼
NetworkX / Neo4j   Qdrant
(Knowledge Graph)  (Vector DB)

Three Query Paths / 三條查詢路徑

Path	Latency	Implementation
Map / Card Query	< 100 ms	Sync REST, pure data lookup
Chat	Streaming 2–5 s	LangGraph ReAct agent, WebSocket
Deep Analysis	2–5 s (cache hit < 100 ms)	Async, 7-day SQLite cache, WebSocket push

Project Structure / 專案結構

storysphere/
├── src/
│   ├── api/               # FastAPI routers, schemas, WebSocket managers
│   ├── agents/
│   │   ├── chat_agent.py       # LangGraph streaming chat agent
│   │   ├── analysis_agent.py   # Cache-first deep analysis orchestrator
│   │   ├── timeline_agent.py   # Timeline event agent
│   │   └── states.py           # ChatState (Pydantic, 8 fields)
│   ├── services/          # Business logic (KG, Document, Vector, Summary, Analysis…)
│   ├── tools/
│   │   ├── graph_tools/        # 6 tools: entity/relation/subgraph queries
│   │   ├── retrieval_tools/    # 5 tools: vector search, summary, keywords, paragraphs
│   │   ├── analysis_tools/     # 3 tools: insight, character analysis, event analysis
│   │   └── composite_tools/    # 4 tools: entity profile, relationship, character arc, event profile
│   ├── pipelines/         # ETL — document processing, feature extraction, KG building
│   ├── workflows/         # High-level business orchestration (ingestion)
│   ├── domain/            # Entity, Relation, Event, Document Pydantic models
│   ├── core/              # LLM client factory, metrics, tracing, utilities
│   └── config/            # Settings (pydantic-settings), archetype JSON configs
├── frontend/
│   ├── src/
│   │   ├── pages/         # LibraryPage, ReaderPage, GraphPage, TimelinePage, AnalysisPage…
│   │   ├── components/    # layout / chat / graph / reader / timeline / analysis / ui
│   │   └── contexts/      # ThemeContext, ChatContext
│   └── package.json
├── docs/
│   ├── CORE.md            # Master design document (always read first)
│   └── appendix/          # ADR-001 to ADR-009, tools catalog, parallel impl notes
├── tests/                 # 331+ unit tests (pytest)
├── pyproject.toml
└── .env.example

Quick Start / 快速開始

Prerequisites / 前置需求

Python ≥ 3.11 (managed via pyenv recommended)
Node.js ≥ 18
uv — Python package manager
A Gemini API key (primary LLM) — or OpenAI / Anthropic / local LLM as alternative

Backend

# 1. Clone and enter the project
git clone <repo-url> && cd StorySphere

# 2. Copy and fill in environment variables
cp .env.example .env
# Edit .env — at minimum set GEMINI_API_KEY

# 3. Install Python dependencies
uv sync

# 4. Start the API server
uv run uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload

API docs available at http://localhost:8000/docs

Frontend

cd frontend
npm install
npm run dev
# Opens at http://localhost:5173

Configuration / 環境設定

All settings are loaded from .env (see .env.example). Key variables:

Variable	Default	Description
`GEMINI_API_KEY`	—	Google Gemini API key (primary LLM)
`OPENAI_API_KEY`	—	OpenAI fallback
`ANTHROPIC_API_KEY`	—	Anthropic fallback
`LOCAL_LLM_MODEL`	`""`	Local model name (e.g. `llama3.2`). Empty = disabled
`LOCAL_LLM_BASE_URL`	`http://localhost:11434/v1`	Ollama / llama.cpp endpoint
`KG_MODE`	`networkx`	Knowledge graph backend: `networkx` \| `neo4j`
`KG_PERSISTENCE_PATH`	`./data/knowledge_graph.json`	Local KG snapshot path
`QDRANT_URL`	`http://localhost:6333`	Qdrant vector DB endpoint
`DATABASE_URL`	`sqlite+aiosqlite:///./storysphere.db`	Main SQLite DB
`KEYWORD_EXTRACTOR_TYPE`	`yake`	`yake` \| `llm` \| `tfidf` \| `composite` \| `none`
`LLM_THINKING_ENABLED`	`false`	Enable extended reasoning (extra tokens)
`CHAT_AGENT_MAX_ITERATIONS`	`10`	ReAct loop cap
`ANALYSIS_CACHE_DB_PATH`	`./data/analysis_cache.db`	Deep analysis SQLite cache

API Overview / API 概覽

Base path: /api/v1

Endpoint	Method	Description
`/books`	GET / POST	List books, ingest a new book
`/books/{id}`	GET / DELETE	Book detail / delete
`/entities`	GET	Query entities (filter by book, type, name)
`/relations`	GET	Query relations
`/search`	GET	Semantic vector search
`/analysis/{book_id}/character/{name}`	POST	Trigger deep character analysis
`/analysis/{book_id}/event/{event_id}`	POST	Trigger deep event analysis
`/tasks/{task_id}`	GET	Async task status
`/metrics`	GET	In-process performance metrics
`/token-usage`	GET	LLM token usage statistics
WS `/ws/chat`	WebSocket	Streaming chat (LangGraph ReAct)
WS `/ws/chat-deep`	WebSocket	Deep-analysis chat
WS `/ws/tasks/{task_id}`	WebSocket	Real-time task progress push

Ingestion Pipeline / 文本攝取流程

Upload PDF / DOCX
      │
      ▼
DocumentProcessingPipeline
  ├── Loader (PDF / DOCX → raw text)
  ├── ChapterDetector
  └── Chunker (paragraph-level)
      │
      ▼
FeatureExtractionPipeline
  ├── EmbeddingGenerator → Qdrant
  └── KeywordExtractor (YAKE / LLM / TF-IDF / Composite)
      │
      ▼
KnowledgGraphPipeline
  ├── EntityExtractor (LLM, tenacity retry)
  ├── RelationExtractor (LLM)
  ├── EntityLinker (dedup by normalised name + alias)
  └── ParagraphEntityLinker
      │
      ▼
SummarizationPipeline
  └── ChapterSummarizer (LLM)

Tools / 工具清單 (18 tools)

Category	Tools
Graph (6)	GetEntityAttrs, GetEntityRelations, GetRelationPaths, GetSubgraph, GetRelationStats, GetEntityTimeline
Retrieval (5)	VectorSearch, GetSummary, GenSummary, GetParagraphs, GetKeywords
Analysis (3)	GenerateInsight, AnalyzeCharacter, AnalyzeEvent
Composite (4)	GetEntityProfile, GetEntityRelationship, GetCharacterArc, GetEventProfile

Deep Analysis / 深度分析

Character Analysis

CEP Extraction — gathers KG data, vector evidence, and keywords in parallel
Archetype Classification — Jung (12) + Schmidt (45) archetype JSONs
Character Arc — timeline-segmented growth curve
Profile Summary — natural-language synthesis

Event Analysis

EEP Extraction — event evidence from KG + vector search
Causality Analysis — cause-effect chain reasoning
Impact Analysis — short/long-term effects on characters and plot

Results are cached in SQLite for 7 days; cache hits return in < 100 ms.

Monitoring / 監控

src/core/metrics.py — MetricsCollector singleton (stdlib-only, thread-safe)

Records: tool selection, tool execution, cache events, agent queries, LLM calls
Exposes P50 / P95 / P99 latency, success rate, cache hit rate
JSON-line logs emitted to storysphere.metrics logger
HTTP endpoint: GET /api/v1/metrics

Testing / 測試

# Run all unit tests
uv run pytest

# Run with coverage
uv run pytest --cov=src --cov-report=term-missing

# Skip integration tests (no API key required)
uv run pytest -m "not integration"

Current test count: 331+ passing across agents, services, tools, pipelines, and core utilities.

Development Status / 開發進度

Phase	Status	Description
Phase 1	✅ Done	Base layer — config, domain, LLM client
Phase 2	✅ Done	ETL pipelines (document, embedding, KG, summarization)
Phase 2b	✅ Done	Keyword extraction (YAKE / LLM / TF-IDF / Composite)
Phase 3	✅ Done	15 base tools
Phase 4	✅ Done	Composite tools + LangGraph Chat Agent
Phase 5	✅ Done	Deep Analysis — character (CEP, archetypes, arc) + event
Phase 6	✅ Done	Parallel optimization (`asyncio.gather`)
Phase 7	✅ Done	Monitoring — `MetricsCollector`, token usage tracking

Docs / 文件

docs/CORE.md — Master design document (start here)
docs/appendix/ — Full ADR-001 to ADR-009, tools catalog, parallel implementation notes

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.claude		.claude
docs		docs
exp_data		exp_data
frontend		frontend
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StorySphere

Overview / 概覽

Tech Stack / 技術棧

Architecture / 架構

Three Query Paths / 三條查詢路徑

Project Structure / 專案結構

Quick Start / 快速開始

Prerequisites / 前置需求

Backend

Frontend

Configuration / 環境設定

API Overview / API 概覽

Ingestion Pipeline / 文本攝取流程

Tools / 工具清單 (18 tools)

Deep Analysis / 深度分析

Character Analysis

Event Analysis

Monitoring / 監控

Testing / 測試

Development Status / 開發進度

Docs / 文件

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StorySphere

Overview / 概覽

Tech Stack / 技術棧

Architecture / 架構

Three Query Paths / 三條查詢路徑

Project Structure / 專案結構

Quick Start / 快速開始

Prerequisites / 前置需求

Backend

Frontend

Configuration / 環境設定

API Overview / API 概覽

Ingestion Pipeline / 文本攝取流程

Tools / 工具清單 (18 tools)

Deep Analysis / 深度分析

Character Analysis

Event Analysis

Monitoring / 監控

Testing / 測試

Development Status / 開發進度

Docs / 文件

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages