ZEN_AI_RAG — Zena AI with RAG & Local LLM

Full-stack AI assistant with Retrieval-Augmented Generation, local LLM inference via llama-cpp, and a NiceGUI web UI — designed to run entirely offline.

Features

Local LLM inference — llama-cpp server with auto-model discovery and GPU/CPU selection
RAG pipeline — BGE embeddings, semantic chunking, Qdrant vector search, layout-aware PDF parsing
Enhanced RAG Pipeline — 22+ SOTA modules with adaptive query routing (see Core Modules)
Multi-format ingestion — PDF, DOCX, XLSX, web pages, code files (30+ types)
NiceGUI UI — dark mode, RAG badges, chat interface, settings sidebar, pipeline selector
Pre-flight validation — auto-checks binary, models, and dependencies on startup
Multi-model support — auto-discovers GGUF models, switch models live
Zena AI personality — customizable AI assistant with enhanced capabilities
RAG Test Bench — lightweight Flask app for A/B testing RAG pipelines side-by-side

Quick Start

# Install dependencies
pip install -r requirements.txt

# Launch (auto-validates and starts llama-cpp server + NiceGUI UI)
python start_llm.py

Or use the batch/shell scripts:

# Windows
Run_Me.bat

# Linux/macOS
./Run_Me.sh

Architecture

start_llm.py                     # Entry point — delegates to zena_mode/server.py
zena.py                          # NiceGUI chat application (main UI)
zena_mode/
├── server.py                    # Orchestrator: model engines, health checks, HTTP endpoints
├── rag_manager.py               # Thread-safe RAG state proxy (uses RLock)
config.py                        # Configuration management (ports, model paths, dirs)
model_manager.py                 # Model discovery and hot-swap

Core Modules (Core/)

The enhanced RAG pipeline consists of 22+ industry-best modules orchestrated by EnhancedRAGService:

Module	File	Description
Query Router	`Core/query_router.py`	Intent classification → adaptive pipeline selection (simple/analytical/multi-hop/aggregate/temporal)
Query Rewriter	`Core/query_rewriter.py`	Multi-query expansion via LLM or templates
HyDE Retrieval	`Core/hyde_retrieval.py`	Hypothetical document embeddings for improved recall
Contextual Retrieval	`Core/contextual_retrieval.py`	Anthropic-style chunk enrichment with document context
Parent Document Retrieval	`Core/parent_document_retrieval.py`	Hierarchical small-to-big context expansion
Knowledge Graph	`Core/knowledge_graph.py`	Entity-relation graph with SQLite backend
Graph RAG	`Core/graph_rag.py`	Community detection and global/local Q&A strategies
Advanced Reranker	`Core/reranker_advanced.py`	5-signal reranking: semantic, position, density, answer-type, source credibility
Smart Deduplicator	`Core/smart_deduplicator.py`	Multi-strategy dedup: exact, boilerplate, structural, semantic, shingle
Contextual Compressor	`Core/contextual_compressor.py`	Query-focused chunk compression to reduce noise
Corrective RAG	`Core/corrective_rag.py`	Self-healing retrieval with quality grading (CORRECT/AMBIGUOUS/INCORRECT)
FLARE Retrieval	`Core/flare_retrieval.py`	Forward-looking active retrieval for uncertain answers
Conflict Detector	`Core/conflict_detector.py`	Cross-source numerical, temporal, and factual conflict detection
Hallucination Detector	`Core/hallucination_detector_v2.py`	Multi-signal detection: ungrounded, NLI contradiction, numerical, causal
Confidence Scorer	`Core/confidence_scorer.py`	Multi-factor quality assessment with risk levels
Follow-Up Generator	`Core/follow_up_generator.py`	Context-aware follow-up question generation
Answer Refinement	`Core/answer_refinement.py`	Post-generation quality pipeline (hallucination fix, completeness, consistency)
Metrics Tracker	`Core/metrics_tracker.py`	Query/index event tracking with latency percentiles and cache hit rates
Zero-Waste Cache	`Core/zero_waste_cache.py`	Two-tier semantic cache: answer-level (T1) + context-level (T2) with fingerprint validation
Inference Guard	`Core/inference_guard.py`	Crash diagnostics, memory snapshots, performance profiling
Prompt Templates	`Core/prompt_focus.py`	12+ built-in templates (Medical, Legal, Business, Technical, Research) with 7 focus modes
Evaluation	`Core/evaluation.py`	Answer quality (faithfulness, relevance, completeness, conciseness) + retrieval metrics (Precision@K, Recall@K, MRR, F1)

Pipeline Flow

Query → QueryRewriter → QueryRouter → [Strategy Selection]
                                         │
                    ┌────────────────────┤
                    ▼                    ▼
              Simple Path          Analytical Path
              (fast)               (full SOTA)
                    │                    │
                    ▼                    ▼
              Retrieve            HyDE + Retrieve
                    │                    │
                    └────────┬───────────┘
                             ▼
                    Parent Document Expand
                             ▼
                    Smart Deduplicator
                             ▼
                    Advanced Reranker
                             ▼
                    Conflict Detector
                             ▼
                    Contextual Compressor
                             ▼
                    Corrective RAG (CRAG)
                             ▼
                    FLARE (if uncertain)
                             ▼
                    LLM Generate Answer
                             ▼
                    Post-Processing:
                    ├── Hallucination Detection
                    ├── Confidence Scoring
                    ├── Follow-Up Generation
                    └── Metrics Recording

UI Features

Pipeline selector: Choose between Standard RAG and Enhanced (SOTA) pipeline
Live metadata display: Routing intent, pipeline stages, latency, confidence score, hallucination probability
CRAG feedback: Retrieval grade and corrections applied
FLARE status: Iteration count and sub-queries generated
Conflict alerts: Cross-source contradictions detected
Follow-up suggestions: Context-aware next questions

Running Tests

# All tests (uses the test runner with server lifecycle management)
python run_tests.py

# Core module tests only (94 tests, fast)
python -m pytest tests/test_core_modules.py -v

# Enhanced RAG integration tests (38 tests)
python -m pytest tests/test_enhanced_rag.py -v

# Deep audit tests — KnowledgeGraph, Pydantic models, pipeline integration (67 tests)
python -m pytest tests/test_deep_audit.py -v

# All three test suites (199 tests)
python -m pytest tests/test_core_modules.py tests/test_enhanced_rag.py tests/test_deep_audit.py -v

# With coverage
python run_tests.py --coverage

Dependencies

llama-cpp-python — local GGUF model inference
nicegui — web UI
sentence-transformers — BGE embeddings + cross-encoder reranking
qdrant-client — vector database
beautifulsoup4, PyMuPDF — document parsing
pydantic — data validation for RAG models
psutil — process/memory monitoring for inference guard

Configuration

config.py / config.json — Ports (default engine: 8001), model paths, directories
settings.json — User preferences (theme, voice, RAG settings)
data/custom_prompts.json — User-created prompt templates

Code Quality — X-Ray Scan (March 2026)

Metric	Value
Files scanned	507
Rules checked	42
HIGH severity	32 (down from 36 after fixes)
MEDIUM severity	514
Fixes applied	SEC-003: subprocess `shell=True` → `shell=False` in utils.py, utils_hardware.py, dependency_check.py
Fixes applied	PORT-002: Hardcoded `C:/AI/Models` → `Path.home() / "AI" / "Models"` in server.py
Fixes applied	SEC-009: pickle.load annotated with integrity check comment
Remaining	PY-005 (json.load), SEC-007 false positives (function names containing 'eval'/'exec')
Status	Security-hardened

Run locally: python -m xray . --dry-run

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.claude		.claude
.github		.github
.streamlit		.streamlit
API		API
Core		Core
Rustified		Rustified
ZenaBot		ZenaBot
adapters		adapters
bench		bench
conversation_cache		conversation_cache
data		data
dist_build		dist_build
examples		examples
medical_scenarios		medical_scenarios
qdrant_storage		qdrant_storage
rag-test-bench		rag-test-bench
rag_storage		rag_storage
rag_verification_storage		rag_verification_storage
rust_output		rust_output
scripts		scripts
server		server
src		src
startup		startup
static		static
tests		tests
tools		tools
ui		ui
ui_components		ui_components
ui_flet		ui_flet
zena_mode		zena_mode
.env.example		.env.example
.gitignore		.gitignore
.xrayignore		.xrayignore
AntiGrav_to_do.json		AntiGrav_to_do.json
AntiGrav_to_do.md		AntiGrav_to_do.md
Install_llama_cpp.bat		Install_llama_cpp.bat
LICENSE		LICENSE
README.md		README.md
Rebuild_Prompt.md		Rebuild_Prompt.md
Run_Me.bat		Run_Me.bat
Run_Me.sh		Run_Me.sh
Start_ZenAI.bat		Start_ZenAI.bat
_backend_stub.py		_backend_stub.py
_run_xray_audit.py		_run_xray_audit.py
_xray_connections.json		_xray_connections.json
_xray_dead_functions.json		_xray_dead_functions.json
_zena_analisis		_zena_analisis
adapter_factory.py		adapter_factory.py
agent_performance.db		agent_performance.db
app.py		app.py
async_backend.py		async_backend.py
backend_impl.py		backend_impl.py
backend_protocol.py		backend_protocol.py
config.json		config.json
config.py		config.py
config_system.py		config_system.py
conftest.py		conftest.py
download_model.py		download_model.py
import_graph.html		import_graph.html
install_llama_cpp_windows.ps1		install_llama_cpp_windows.ps1
knowledge_base.json		knowledge_base.json
llm_adapters.py		llm_adapters.py
mock_backend.py		mock_backend.py
model_manager.py		model_manager.py
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
pytest.ini		pytest.ini
rag_dir_fail.png		rag_dir_fail.png
rag_integration.py		rag_integration.py
rag_website_fail.png		rag_website_fail.png
requirements.txt		requirements.txt
requirements.txt.patched		requirements.txt.patched
scan_after_fix.json		scan_after_fix.json
scan_v3.json		scan_v3.json
scan_v4.json		scan_v4.json
settings.json		settings.json
settings.py		settings.py
settings_test.json		settings_test.json
start_llm.py		start_llm.py
state_management.py		state_management.py
test_audio_injection.wav		test_audio_injection.wav
test_history.json		test_history.json
test_payload.json		test_payload.json
ui_fatal_crash.txt		ui_fatal_crash.txt
ui_state.py		ui_state.py
utils.py		utils.py
voice_service.py		voice_service.py
voice_trace.txt		voice_trace.txt
x_ray_claude.py		x_ray_claude.py
xray_before.json		xray_before.json
xray_scan_latest.json		xray_scan_latest.json
yolo11n.pt		yolo11n.pt
yolo26n.pt		yolo26n.pt
zena.py		zena.py
zena_flet.py		zena_flet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZEN_AI_RAG — Zena AI with RAG & Local LLM

Features

Quick Start

Architecture

Core Modules (Core/)

Pipeline Flow

UI Features

Running Tests

Dependencies

Configuration

Code Quality — X-Ray Scan (March 2026)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ZEN_AI_RAG — Zena AI with RAG & Local LLM

Features

Quick Start

Architecture

Core Modules (Core/)

Pipeline Flow

UI Features

Running Tests

Dependencies

Configuration

Code Quality — X-Ray Scan (March 2026)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages