A production-ready Spring Boot 3.4.0 application implementing Retrieval-Augmented Generation (RAG) with vector embeddings, semantic search, and multi-LLM provider support.
π Complete Documentation Index | π Quick Start | π¨βπ» Developer Guide | π§ CI/CD Setup
AstraDesk RAG Mini is an enterprise-grade RAG system that:
- π Performs semantic search across document collections using vector embeddings
- π Ingests multiple document formats (PDF, HTML, Markdown, TXT)
- π Stores vectors in PostgreSQL with pgvector extension
- π Streams real-time ingestion progress via Server-Sent Events
- π€ Supports multiple LLM providers (OpenAI, Spring AI, Fake for testing)
- πΎ Integrates with S3/MinIO for document storage
- π― Chunks documents intelligently with configurable overlap
- π Detects document language automatically
| Component | Technology | Version |
|---|---|---|
| Framework | Spring Boot | 3.4.0 |
| Java | Temurin | 21 LTS |
| Database | PostgreSQL + pgvector | 16/17 |
| Vector Store | pgvector | 0.1.6 |
| AI/ML | Spring AI + OpenAI | 0.8.1 |
| Storage | AWS S3 SDK v2 / MinIO | 2.38.2 |
| Build | Gradle | 8.14 |
| Container | Docker | Multi-stage |
| Observability | Micrometer + OpenTelemetry | Latest |
| Feature | Status | Notes |
|---|---|---|
| WebFlux (Reactive) | β Stable | All filters use WebFilter |
| JDBC + HikariCP | β Stable | Blocking calls, consider R2DBC for high traffic |
| OpenAI Embeddings | β Stable | text-embedding-3-small (1536d) |
| OpenAI Chat | β Stable | gpt-4o-mini |
| Spring AI | API compatibility issues, use OpenAI HTTP | |
| pgvector | β Stable | IVFFlat index, cosine distance |
| S3/MinIO | β Stable | AWS SDK v2 |
| Rate Limiting | β Stable | Token bucket, in-memory |
| OpenTelemetry | β Stable | OTLP exporter |
| TestContainers | β Stable | Integration tests |
| Prometheus | β Stable | Metrics export |
| Docker Health | β Stable | /api/v1/health endpoint |
| API Versioning | β Stable | /api/v1 prefix |
com.astradesk.rag
βββ controller/ # REST API endpoints
β βββ DocumentController
β βββ ZipController
βββ service/ # Business logic
β βββ RagService (search & chat orchestration)
β βββ ZipIngestService (document ingestion)
β βββ Embeddings (interface)
β βββ SpringAiEmbeddings (implementation)
β βββ OpenAiHttpEmbeddings (implementation)
β βββ ChatLLM (interface)
β βββ SpringAiChat (implementation)
β βββ OpenAiHttpChat (implementation)
βββ repo/ # Data access
β βββ DocumentJdbcRepository
β βββ ChunkJdbcRepository
βββ config/ # Spring configuration
β βββ S3Config
β βββ ProviderConfig (dependency injection for providers)
β βββ S3StorageService
β βββ GlobalExceptionHandler
βββ model/ # Data models
β βββ ChunkRecord (search results)
β βββ ProgressEvent (SSE ingestion events - 7 fields)
β βββ HealthResponse (health check - 3 fields)
βββ util/ # Utilities
βββ Chunker
- Java 21+ (OpenJDK Temurin)
- Docker
- OpenAI API Key (optional, for production use)
# Build the project
docker run --rm -v "$PWD":/workspace -w /workspace \
eclipse-temurin:21-jdk bash -c "./gradlew clean build -x test"
# Start all services
./QUICK_START.sh
# Initialize database
./init-database-docker.sh
# Test API v1
curl "http://localhost:8081/api/v1/health"# Start services
docker network create astradesk-rag
docker run -d --name rag-db --network astradesk-rag \
-e POSTGRES_DB=rag -e POSTGRES_USER=rag -e POSTGRES_PASSWORD=rag \
-p 5432:5432 pgvector/pgvector:pg16
# Initialize database
./init-database-docker.sh rag-db
# Start application
docker run -d --name rag-app --network astradesk-rag -p 8081:8080 \
-e SPRING_DATASOURCE_URL=jdbc:postgresql://rag-db:5432/rag \
-e RAG_PROVIDER_EMBEDDINGS=fake -e RAG_PROVIDER_CHAT=fake \
-v "$PWD":/workspace -w /workspace eclipse-temurin:21-jdk \
bash -c "java -jar build/libs/astradesk-rag-mini-0.2.0.jar"The application will be available at http://localhost:8081
Note: All endpoints use
/api/v1prefix. See API Migration Guide for details.
Request:
GET /api/v1/docs/search?q=Spring%20AI&k=5Parameters:
q(required): Search queryk(optional): Number of results to return (default: 5)
Response:
[
{
"id": 1,
"docId": 1,
"chunkIndex": 0,
"pageFrom": 1,
"pageTo": 1,
"content": "Spring AI enables developers to...",
"score": 0.92
}
]Request:
POST /api/v1/ingest/zip?collection=docs&maxLen=1200&overlap=200
Content-Type: multipart/form-data
file=@archive.zipParameters:
file(required): ZIP archivecollection(optional): Collection name (default: "default")maxLen(optional): Chunk max length (default: 1200)overlap(optional): Chunk overlap (default: 200)
Response (Server-Sent Events):
event: progress
data: {"stage":"RECEIVED","file":"document.pdf","processed":1,"message":"processing","error":null}
event: progress
data: {"stage":"INDEXED","file":"document.pdf","page":1,"processed":1,"total":10,"message":"ok","error":null}
event: progress
data: {"stage":"DONE","file":"archive.zip","message":"finished","error":null}
server:
port: 8080
spring:
application:
name: astradesk-rag-mini
datasource:
url: jdbc:postgresql://localhost:5432/rag
username: rag
password: rag
ai:
openai:
api-key: ${OPENAI_API_KEY:}
chat:
options:
model: gpt-4o-mini
embedding:
options:
model: text-embedding-3-small
rag:
provider:
embeddings: springai # springai | openai | fake
chat: springai # springai | openai | fake
embedding-dim: 1536
topk: 5
chunk:
maxLen: 1200
overlap: 200
s3:
endpoint: ${S3_ENDPOINT:http://localhost:9000}
region: ${S3_REGION:us-east-1}
accessKey: ${S3_ACCESS_KEY:minioadmin}
secretKey: ${S3_SECRET_KEY:minioadmin}
bucket: ${S3_BUCKET:astradesk-rag}
pathStyleAccess: true# OpenAI
OPENAI_API_KEY=sk-...
# S3/MinIO
S3_ENDPOINT=http://localhost:9000
S3_REGION=us-east-1
S3_ACCESS_KEY=minioadmin
S3_SECRET_KEY=minioadmin
S3_BUCKET=astradesk-rag
# Database (optional, overrides yaml)
SPRING_DATASOURCE_URL=jdbc:postgresql://db:5432/rag
SPRING_DATASOURCE_USERNAME=rag
SPRING_DATASOURCE_PASSWORD=ragUse conditional bean injection for flexibility:
// Configuration automatically selects based on rag.provider.* properties
// - springai: Production-ready Spring AI integration
// - openai: Direct HTTP to OpenAI API
// - fake: Testing/development without API costs# Optimal chunk settings (tested)
maxLen: 1200 # Characters per chunk
overlap: 200 # Character overlap for context continuity- text-embedding-3-small: Fast, cost-effective (1536 dims)
- text-embedding-3-large: Better quality (3072 dims) - configure via
rag.embedding-dim
- Index Type: IVFFlat with cosine distance
- Lists Parameter: 100 (tunable based on dataset size)
- Query: Always use LIMIT k for performance
// GlobalExceptionHandler provides:
- MaxUploadSizeExceededException β 413 Payload Too Large
- IllegalArgumentException β 400 Bad Request
- Generic Exception β 500 Internal Server Error-- Documents table
CREATE TABLE docs (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
language TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
-- Chunks table with vector embeddings
CREATE TABLE chunks (
id BIGSERIAL PRIMARY KEY,
doc_id BIGINT REFERENCES docs(id) ON DELETE CASCADE,
chunk_index INT NOT NULL,
page_from INT, page_to INT,
source_key TEXT,
content TEXT NOT NULL,
embedding VECTOR(1536) NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
-- Indexes for performance
CREATE INDEX idx_chunks_docid ON chunks(doc_id);
CREATE INDEX idx_chunks_embedding
ON chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);./gradlew testTests use TestContainers for isolated PostgreSQL:
@SpringBootTest
@Testcontainers
public class RagServiceTest {
@Container
static PostgreSQLContainer<?> pg = new PostgreSQLContainer<>("pgvector/pgvector:pg16");
@Test
void searchWorks() {
List<ChunkRecord> res = rag.search("query", 3);
assertNotNull(res);
}
}Automated Test:
./test-api-v1.sh 8081Manual Tests:
# Health
curl "http://localhost:8081/api/v1/health"
# Search
curl "http://localhost:8081/api/v1/docs/search?q=AI&k=3"
# Ingest
curl -X POST -F "file=@docs.zip" \
"http://localhost:8081/api/v1/ingest/zip" --no-bufferDocument β Split into chunks (max 1200 chars, overlap 200 chars)
β Embed each chunk (1536-dim vectors)
β Store in PostgreSQL with IVFFlat index
β Query with semantic similarity
- Batch Processing: Process multiple chunks concurrently
- Connection Pooling: HikariCP (default, auto-configured)
- Vector Index: Tune IVFFlat
listsparameter:- Small datasets (<10k):
lists=10 - Medium (10k-100k):
lists=100 - Large (>100k):
lists=300+
- Small datasets (<10k):
- Search Limit: Use reasonable
kvalues (5-10 typically sufficient)
- API Keys: Use environment variables, never hardcode
- CORS: Configure appropriately for frontend access
- File Upload:
- Validate file types (already implemented)
- Set max upload size (via
server.servlet.multipart.max-file-size)
- Database: Use connection pooling, prepared statements (JDBC templates handle this)
- S3 Credentials: Rotate regularly, use IAM roles in cloud
@Configuration
public class CorsConfig implements WebMvcConfigurer {
@Override
public void addCorsMappings(CorsRegistry registry) {
registry.addMapping("/api/**")
.allowedOrigins("https://yourdomain.com")
.allowedMethods("GET", "POST", "OPTIONS")
.maxAge(3600);
}
}docker build -t astradesk-rag:1.0 .
docker push your-registry/astradesk-rag:1.0
# Deploy with environment variables
docker run -e OPENAI_API_KEY=sk-... \
-e S3_ENDPOINT=https://s3.amazonaws.com \
-e SPRING_DATASOURCE_URL=jdbc:postgresql://prod-db:5432/rag \
-p 8080:8080 \
astradesk-rag:1.0helm install astradesk-rag ./helm-chart \
--set openai.apiKey=$OPENAI_API_KEY \
--set postgres.host=prod-pg \
--set s3.endpoint=https://s3.amazonaws.com# Add to application.yml for production
logging:
level:
com.astradesk.rag: INFO
org.springframework: WARN
pattern:
console: "%d{yyyy-MM-dd HH:mm:ss} - %msg%n"
management:
endpoints:
web:
exposure:
include: health,metrics,info
metrics:
export:
prometheus:
enabled: true# Check if pgvector extension is installed
docker exec astradesk-rag-mini-db psql -U rag -d rag -c "CREATE EXTENSION IF NOT EXISTS vector;"# Increase JVM heap (in Dockerfile or JVM_OPTS)
ENV JAVA_OPTS="-Xmx2g -Xms1g -XX:+UseZGC"# Test MinIO connectivity
docker exec astradesk-rag-mini-app curl -v http://minio:9000/minio/health/liveIf the Gradle build fails with an error that includes a Java version like 25.0.1 (for example, an exception during Gradle script evaluation that mentions JavaVersion.parse), your system JDK is newer than the Kotlin/Gradle tooling expects. The project requires Java 21 for the Gradle runtime. Options to resolve:
- Install and use Temurin/OpenJDK 21 and set
JAVA_HOMEbefore running Gradle:
# Example using SDKMAN (recommended for developers):
curl -s "https://get.sdkman.io" | bash
source "$HOME/.sdkman/bin/sdkman-init.sh"
sdk install java 21.0.0-tem
sdk use java 21.0.0-tem
./gradlew clean build- Install Temurin 21 via OS package manager (Debian/Ubuntu example):
# Adoptium repo install (Debian/Ubuntu)
wget -O - https://packages.adoptium.net/artifactory/api/gpg/key/public | sudo apt-key add -
echo 'deb https://packages.adoptium.net/artifactory/deb $(lsb_release -cs) main' | sudo tee /etc/apt/sources.list.d/adoptium.list
sudo apt-get update
sudo apt-get install -y temurin-21-jdk
export JAVA_HOME="/usr/lib/jvm/temurin-21-jdk"
./gradlew clean build- Use a Docker fallback to run the Gradle wrapper inside a JDK 21 container:
docker run --rm -v "$PWD":/workspace -w /workspace eclipse-temurin:21-jdk bash -c "./gradlew clean build"This avoids changing your system Java and is useful for CI or one-off builds.
- Check IVFFlat index configuration
- Verify query k parameter isn't too large
- Monitor table statistics:
ANALYZE chunks;
- Quick Start Guide - Get started in 5 minutes
- Developer Guide - Comprehensive development guide
- API Migration Guide - v1 API changes and migration
- Database Setup - Database initialization scripts
- Local Development Checklist - Minimal local setup and exact build versions
- Database Tuning Guide - PostgreSQL, pgvector, IVFFlat optimization
- Quick Wins Implementation - Recent improvements
- Implementation Checklist - Verification steps
- CI/CD Setup - GitHub Actions & GitLab CI/CD
- CI/CD Quick Reference - Quick commands
- Frontend Setup - React/Next.js setup
- Frontend Guide - Component usage
- Integration Summary - Backend-Frontend integration
- Project Status - Current state and roadmap
- Fixes Applied - Bug fixes and improvements
- Integration Fixes - Recent integration improvements
- Build Success - Build verification results
- Deployment Success - Deployment verification
- Verification Checklist - Complete verification steps
- Code Style: Follow Google Java Style Guide
- Testing: Maintain >80% code coverage
- Documentation: Update README for user-facing changes
- Commits: Use conventional commits (feat:, fix:, docs:, etc.)
# Run quality checks before commit
./gradlew checkThis project is licensed under the MIT License - see LICENSE file for details.
For issues or questions:
- Check existing GitHub issues
- Review troubleshooting section
- Run
./test-api-v1.shto verify setup - Contact: s.sobolewski@hotmail.com
QUICK_START.sh- Start all servicesinit-database.sh- Initialize production databaseinit-database-docker.sh- Initialize Docker databasetest-api-v1.sh- Test API v1 endpoints
Last Updated: 2025-01-24
Author: Cartesian School - Siergej Sobolewski
Contact: s.sobolewski@hotmail.com