Skip to content

AnubhavMadhav/XLR8

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project XLR8 (Accelerate) 🚀

Go Weaviate Architecture

High-Throughput RAG Ingestion Engine with Concurrent Worker Pools

1. Project Overview

Project XLR8 is a high-performance, concurrent ingestion engine designed to process massive datasets for RAG (Retrieval-Augmented Generation) applications.

Unlike simple scripts that process documents sequentially, XLR8 implements a Bounded Worker Pool architecture. It reads raw text, generates vector embeddings using a local LLM (Ollama), and indexes them into a Vector Database (Weaviate)—all while strictly adhering to API rate limits and memory constraints. It features a real-time TUI (Terminal User Interface) to visualize throughput and latency.


2. Architecture

I chose a Fan-Out / Fan-In pattern with a Token Bucket Rate Limiter. This ensures the system utilizes available concurrency without overwhelming the downstream Embedding API or the Vector Database.

xlr8-architecture

Why this Architecture?

  • Bounded Concurrency: Prevents memory leaks and CPU thrashing by capping the number of active goroutines.
  • Backpressure Handling: The Job Channel acts as a buffer. If the database slows down, workers naturally slow down, preventing system crash.
  • Decoupled Components: The UI is an observer; it doesn't block the ingestion pipeline.

3. What is Project XLR8?

Layman's Explanation (The Airport Analogy)

Imagine an airport with thousands of passengers (Documents) trying to catch a flight (Get into the Database).

  • Bad Approach: Everyone rushes the gate at once. The scanners break, security is overwhelmed, and chaos ensues.
  • XLR8 Approach:
    • The Line (Channel): Passengers wait in an orderly queue.
    • The Security Agents (Workers): 50 agents process passengers in parallel.
    • The Officer (Rate Limiter): A guard ensures only 10 people pass per second, even if agents are fast, to comply with regulations.
    • The Shuttle Bus (Batcher): Once through security, passengers wait on a bus. The bus only leaves when it's full (e.g., 50 people), reducing the number of trips to the plane.

Technical Explanation

XLR8 is a CLI tool written in Go that implements the Producer-Consumer pattern. It spins up a fixed pool of goroutines (workers) that consume jobs from a buffered channel. Each worker acquires a token from a x/time/rate limiter before calling an external Embedding API. Results are aggregated by a Batcher which flushes to Weaviate in optimal chunk sizes to maximize I/O throughput.


4. The Flow

Raw Text → Job Queue → Worker Pool (Rate Limited) → Ollama Embedding → Batcher → Weaviate Index


The Pipeline Flow

  1. Producer (Generator): Pushes raw documents into a buffered Jobs channel.
  2. Token Bucket Rate Limiter: Regulates the speed at which workers can consume jobs, ensuring thatupstream API quotas (500 RPM) is never exceeded.
  3. Worker Pool: A fixed set of Goroutines (N=50) that:
    • Acquire a token from the Limiter.
    • Generate Embeddings (Interface-driven).
    • Push processed vectors to a Results channel.
  4. Consumer (Aggregator): Batches results for the Vector DB and updates the real-time TUI.

Reliability & Constraints

  • Race Condition Free: Validated via go run -race.
  • Graceful Shutdown: Handles SIGINT (Ctrl+C) by stopping the producer, waiting for in-flight jobs to drain, and closing connections safely.
  • Interface Driven: Modular design allows hot-swapping of LLM providers and Vector Databases.

5. Key Features

  • True Concurrency: Utilizes Go routines and Channels for non-blocking processing.
  • Rate Limiting: Token-bucket algorithm prevents 429 (Too Many Requests) errors from LLM providers.
  • Smart Batching: Buffers embeddings in memory and flushes to DB based on size (e.g., 50 docs) or time (e.g., every 1s).
  • Real-Time TUI: Interactive dashboard built with Bubble Tea showing Docs/Sec, Progress, and Error rates.
  • Interface Driven: The Embedder and VectorStore are interfaces, allowing hot-swapping between Ollama/OpenAI or Weaviate/Pinecone.
  • Graceful Shutdown: Handles SIGINT (Ctrl+C) by draining in-flight jobs before exiting to prevent data corruption.

6. Build & Test Pipeline

Quality assurance is built-in to ensure thread safety in the highly concurrent worker pool.

  • Race Condition Detection:

    go run -race cmd/xlr8/main.go

    Ensures no memory is shared unsafely between the 50+ concurrent threads.

  • Unit Testing:

    go test ./... -v

7. Tech Stack & Rationale

Component Technology Why I chose it?
Language Golang (1.23) Best-in-class concurrency primitives (Channels/Goroutines) and raw performance.
Vector DB Weaviate Open-source, Go-native client, and excellent batching capabilities.
AI Inference Ollama Allows local, offline embedding generation (zero cost for dev/test).
CLI UI Bubble Tea Follows "The Elm Architecture" for robust, state-driven terminal UIs.
Rate Limiting x/time/rate Standard library robust token bucket implementation.

8. Setup & Installation

Prerequisites

  • Go 1.23+ installed.
  • Docker & Docker Compose (for Weaviate).
  • Ollama (installed locally for embeddings).

Step 1: Clone & Initialize

git clone https://github.com/AnubhavMadhav/XLR8.git
cd xlr8
go mod tidy

Step 2: Start Infrastructure

  1. Start Vector DB:
    docker-compose up -d
  2. Start AI Model:
    ollama pull nomic-embed-text
    ollama serve

Step 3: Run Ingestion

go run cmd/xlr8/main.go

Step 4: Semantic Search

Once ingestion is complete, query your data:

go run cmd/search/main.go "fast animal"
# Output: Found 'Peregrine Falcon' (Score: 0.89)

9. Demo: XLR8 in Action

Phase 1: High-Throughput Ingestion (The TUI)

Visualizing the concurrent worker pool processing documents in real-time.

Running (Workers Active)

Ingestion Running

Completed (100% Processed)

Ingestion Completed

Phase 2: Verification & Infrastructure

Confirming that data was successfully batched and persisted to the Vector Database.

1. Docker Infrastructure Logs: Weaviate logs confirming batch ingestion from the XLR8 pipeline.
Weaviate Docker Logs

2. Data Verification: Go script verifying vector dimensions and document count.
Verification Script Output


Phase 3: Semantic Search (RAG)

Demonstrating the engine's ability to understand context over keywords.

Test A: Concept "Fast Animals" (Matches Peregrine Falcon)
Search Result: Fast Animal

Test B: Concept "Programming" (Matches Golang, Rust, Kubernetes)
Search Result: Coding

Test C: Concept "Food" (Matches Pizza)
Search Result: Pizza

10. Project Structure

Standard Go Project Layout for maintainability.

xlr8/
├── cmd/
│   ├── xlr8/        # Main entry point (Ingestion)
│   └── search/      # CLI tool for querying Weaviate
├── internal/
│   ├── core/        # Domain types and Interfaces (Embedder/Store)
│   ├── pipeline/    # Orchestrator and Batching logic
│   ├── worker/      # Worker Pool implementation
│   └── ui/          # Bubble Tea TUI models
├── docker-compose.yml
└── go.mod

11. Design Decisions & Trade-offs

  1. Bounded vs. Unbounded Concurrency:

    • Decision: I chose a fixed pool (e.g., 50 workers).
    • Trade-off: Unbounded goroutines could theoretically process faster but would crash the system by exhausting memory or file descriptors under load.
  2. Strict Interface Decoupling:

    • Decision: Embedder is an interface.
    • Benefit: Allows us to switch from Ollama (local) to OpenAI (cloud) by changing just one line of code in main.go, without touching the worker logic.
  3. Batching Strategy:

    • Decision: The Batcher uses a select statement to flush on either size limit OR time interval.
    • Reason: Prevents "stranded data" where the last few documents sit in the buffer forever if the queue dries up.

Built with ❤️ and Golang by Anubhav Madhav.

About

High-performance RAG ingestion engine in Go. Features concurrent worker pools, rate limiting, and a real-time TUI dashboard.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages