Project XLR8 (Accelerate) 🚀

High-Throughput RAG Ingestion Engine with Concurrent Worker Pools

1. Project Overview

Project XLR8 is a high-performance, concurrent ingestion engine designed to process massive datasets for RAG (Retrieval-Augmented Generation) applications.

Unlike simple scripts that process documents sequentially, XLR8 implements a Bounded Worker Pool architecture. It reads raw text, generates vector embeddings using a local LLM (Ollama), and indexes them into a Vector Database (Weaviate)—all while strictly adhering to API rate limits and memory constraints. It features a real-time TUI (Terminal User Interface) to visualize throughput and latency.

2. Architecture

I chose a Fan-Out / Fan-In pattern with a Token Bucket Rate Limiter. This ensures the system utilizes available concurrency without overwhelming the downstream Embedding API or the Vector Database.

Why this Architecture?

Bounded Concurrency: Prevents memory leaks and CPU thrashing by capping the number of active goroutines.
Backpressure Handling: The Job Channel acts as a buffer. If the database slows down, workers naturally slow down, preventing system crash.
Decoupled Components: The UI is an observer; it doesn't block the ingestion pipeline.

3. What is Project XLR8?

Layman's Explanation (The Airport Analogy)

Imagine an airport with thousands of passengers (Documents) trying to catch a flight (Get into the Database).

Bad Approach: Everyone rushes the gate at once. The scanners break, security is overwhelmed, and chaos ensues.
XLR8 Approach:
- The Line (Channel): Passengers wait in an orderly queue.
- The Security Agents (Workers): 50 agents process passengers in parallel.
- The Officer (Rate Limiter): A guard ensures only 10 people pass per second, even if agents are fast, to comply with regulations.
- The Shuttle Bus (Batcher): Once through security, passengers wait on a bus. The bus only leaves when it's full (e.g., 50 people), reducing the number of trips to the plane.

Technical Explanation

XLR8 is a CLI tool written in Go that implements the Producer-Consumer pattern. It spins up a fixed pool of goroutines (workers) that consume jobs from a buffered channel. Each worker acquires a token from a x/time/rate limiter before calling an external Embedding API. Results are aggregated by a Batcher which flushes to Weaviate in optimal chunk sizes to maximize I/O throughput.

4. The Flow

Raw Text → Job Queue → Worker Pool (Rate Limited) → Ollama Embedding → Batcher → Weaviate Index

The Pipeline Flow

Producer (Generator): Pushes raw documents into a buffered Jobs channel.
Token Bucket Rate Limiter: Regulates the speed at which workers can consume jobs, ensuring thatupstream API quotas (500 RPM) is never exceeded.
Worker Pool: A fixed set of Goroutines (N=50) that:
- Acquire a token from the Limiter.
- Generate Embeddings (Interface-driven).
- Push processed vectors to a Results channel.
Consumer (Aggregator): Batches results for the Vector DB and updates the real-time TUI.

Reliability & Constraints

Race Condition Free: Validated via go run -race.
Graceful Shutdown: Handles SIGINT (Ctrl+C) by stopping the producer, waiting for in-flight jobs to drain, and closing connections safely.
Interface Driven: Modular design allows hot-swapping of LLM providers and Vector Databases.

5. Key Features

True Concurrency: Utilizes Go routines and Channels for non-blocking processing.
Rate Limiting: Token-bucket algorithm prevents 429 (Too Many Requests) errors from LLM providers.
Smart Batching: Buffers embeddings in memory and flushes to DB based on size (e.g., 50 docs) or time (e.g., every 1s).
Real-Time TUI: Interactive dashboard built with Bubble Tea showing Docs/Sec, Progress, and Error rates.
Interface Driven: The Embedder and VectorStore are interfaces, allowing hot-swapping between Ollama/OpenAI or Weaviate/Pinecone.
Graceful Shutdown: Handles SIGINT (Ctrl+C) by draining in-flight jobs before exiting to prevent data corruption.

6. Build & Test Pipeline

Quality assurance is built-in to ensure thread safety in the highly concurrent worker pool.

Race Condition Detection:
```
go run -race cmd/xlr8/main.go
```
Ensures no memory is shared unsafely between the 50+ concurrent threads.
Unit Testing:
```
go test ./... -v
```

7. Tech Stack & Rationale

Component	Technology	Why I chose it?
Language	Golang (1.23)	Best-in-class concurrency primitives (Channels/Goroutines) and raw performance.
Vector DB	Weaviate	Open-source, Go-native client, and excellent batching capabilities.
AI Inference	Ollama	Allows local, offline embedding generation (zero cost for dev/test).
CLI UI	Bubble Tea	Follows "The Elm Architecture" for robust, state-driven terminal UIs.
Rate Limiting	x/time/rate	Standard library robust token bucket implementation.

8. Setup & Installation

Prerequisites

Go 1.23+ installed.
Docker & Docker Compose (for Weaviate).
Ollama (installed locally for embeddings).

Step 1: Clone & Initialize

git clone https://github.com/AnubhavMadhav/XLR8.git
cd xlr8
go mod tidy

Step 2: Start Infrastructure

Start Vector DB:
```
docker-compose up -d
```

Start AI Model:

ollama pull nomic-embed-text
ollama serve

Step 3: Run Ingestion

go run cmd/xlr8/main.go

Step 4: Semantic Search

Once ingestion is complete, query your data:

go run cmd/search/main.go "fast animal"
# Output: Found 'Peregrine Falcon' (Score: 0.89)

9. Demo: XLR8 in Action

Phase 1: High-Throughput Ingestion (The TUI)

Visualizing the concurrent worker pool processing documents in real-time.

Running (Workers Active)

Completed (100% Processed)

Phase 2: Verification & Infrastructure

Confirming that data was successfully batched and persisted to the Vector Database.

1. Docker Infrastructure Logs: Weaviate logs confirming batch ingestion from the XLR8 pipeline.

2. Data Verification: Go script verifying vector dimensions and document count.

Phase 3: Semantic Search (RAG)

Demonstrating the engine's ability to understand context over keywords.

Test A: Concept "Fast Animals" (Matches Peregrine Falcon)

Test B: Concept "Programming" (Matches Golang, Rust, Kubernetes)

Test C: Concept "Food" (Matches Pizza)

10. Project Structure

Standard Go Project Layout for maintainability.

xlr8/
├── cmd/
│   ├── xlr8/        # Main entry point (Ingestion)
│   └── search/      # CLI tool for querying Weaviate
├── internal/
│   ├── core/        # Domain types and Interfaces (Embedder/Store)
│   ├── pipeline/    # Orchestrator and Batching logic
│   ├── worker/      # Worker Pool implementation
│   └── ui/          # Bubble Tea TUI models
├── docker-compose.yml
└── go.mod

11. Design Decisions & Trade-offs

Bounded vs. Unbounded Concurrency:
- Decision: I chose a fixed pool (e.g., 50 workers).
- Trade-off: Unbounded goroutines could theoretically process faster but would crash the system by exhausting memory or file descriptors under load.
Strict Interface Decoupling:
- Decision: Embedder is an interface.
- Benefit: Allows us to switch from Ollama (local) to OpenAI (cloud) by changing just one line of code in main.go, without touching the worker logic.
Batching Strategy:
- Decision: The Batcher uses a select statement to flush on either size limit OR time interval.
- Reason: Prevents "stranded data" where the last few documents sit in the buffer forever if the queue dries up.

Built with ❤️ and Golang by Anubhav Madhav.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
xlr8		xlr8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
xlr8-architecture.png		xlr8-architecture.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project XLR8 (Accelerate) 🚀

1. Project Overview

2. Architecture

Why this Architecture?

3. What is Project XLR8?

Layman's Explanation (The Airport Analogy)

Technical Explanation

4. The Flow

The Pipeline Flow

Reliability & Constraints

5. Key Features

6. Build & Test Pipeline

7. Tech Stack & Rationale

8. Setup & Installation

Prerequisites

Step 1: Clone & Initialize

Step 2: Start Infrastructure

Step 3: Run Ingestion

Step 4: Semantic Search

9. Demo: XLR8 in Action

Phase 1: High-Throughput Ingestion (The TUI)

Phase 2: Verification & Infrastructure

Phase 3: Semantic Search (RAG)

Test C: Concept "Food" (Matches Pizza)

10. Project Structure

11. Design Decisions & Trade-offs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project XLR8 (Accelerate) 🚀

1. Project Overview

2. Architecture

Why this Architecture?

3. What is Project XLR8?

Layman's Explanation (The Airport Analogy)

Technical Explanation

4. The Flow

The Pipeline Flow

Reliability & Constraints

5. Key Features

6. Build & Test Pipeline

7. Tech Stack & Rationale

8. Setup & Installation

Prerequisites

Step 1: Clone & Initialize

Step 2: Start Infrastructure

Step 3: Run Ingestion

Step 4: Semantic Search

9. Demo: XLR8 in Action

Phase 1: High-Throughput Ingestion (The TUI)

Phase 2: Verification & Infrastructure

Phase 3: Semantic Search (RAG)

Test C: Concept "Food" (Matches Pizza)

10. Project Structure

11. Design Decisions & Trade-offs

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages