A comprehensive Retrieval-Augmented Generation (RAG) system for financial research, deployed on AWS with Terraform.
Valyrion RAG transforms the original tool-based financial research agent into an enterprise-grade system capable of:
- Massive Knowledge Base: Index and query millions of financial documents
- Hybrid Retrieval: Vector search + BM25 + Knowledge Graph traversal
- Multi-Agent Orchestration: Planning, Action, Validation, and Answer agents
- Real-Time Updates: Continuous ingestion from SEC EDGAR, news, and APIs
- Enterprise Scale: Deployed on AWS with auto-scaling and high availability
- User Interface: CLI, Web API, WebSocket streaming
- Orchestration: Multi-agent system (Planning → Action → Validation → Answer)
- RAG Intelligence: Query understanding, hybrid retrieval, re-ranking
- Knowledge Storage: Qdrant, Neo4j, PostgreSQL, Elasticsearch, Redis, S3
- Data Ingestion: Multi-source collectors, document processing, parallel indexing
See Architecture Documentation for details.
- AWS Account with configured credentials
- Terraform >= 1.5.0
- Python 3.10+
- Docker
- OpenAI API key
- Finnhub API key (free tier)
-
Clone repository
cd valyrion -
Install dependencies
# Using uv (recommended) uv sync # Or using pip pip install -r requirements.txt pip install -e .
-
Configure environment
cp env.example .env # Edit .env with your API keys and configuration -
Set up Terraform backend
# Create S3 bucket for Terraform state aws s3 mb s3://valyrion-terraform-state --region us-east-1 # Create DynamoDB table for state locking aws dynamodb create-table \ --table-name valyrion-terraform-locks \ --attribute-definitions AttributeName=LockID,AttributeType=S \ --key-schema AttributeName=LockID,KeyType=HASH \ --billing-mode PAY_PER_REQUEST \ --region us-east-1
-
Deploy infrastructure (DO NOT RUN YET - just example)
cd terraform/environments/dev terraform init terraform plan # terraform apply # Only when ready to deploy
valyrion/
├── terraform/ # Infrastructure as Code
│ ├── modules/
│ │ ├── networking/ # VPC, subnets, security groups
│ │ ├── databases/ # RDS, Redis, OpenSearch, S3
│ │ └── ecs-service/ # ECS task definitions
│ └── environments/
│ ├── dev/ # Development environment
│ └── prod/ # Production environment
│
├── src/valyrion/ # Application code
│ ├── rag/ # RAG system
│ │ ├── ingestion/ # Document fetching, parsing, chunking
│ │ ├── retrieval/ # Vector, BM25, graph search
│ │ ├── storage/ # Database clients
│ │ ├── embeddings/ # OpenAI embeddings
│ │ └── query/ # Query understanding
│ ├── agents/ # Multi-agent system
│ ├── api/ # FastAPI application
│ └── workers/ # Background workers
│
├── docker/ # Dockerfiles
│ ├── Dockerfile.api # API server
│ └── Dockerfile.worker # Ingestion worker
│
├── scripts/ # Utility scripts
│ ├── ingest_sec_filings.py # SEC data ingestion
│ └── evaluate_retrieval.py # RAG evaluation
│
├── tests/ # Test suite
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ └── e2e/ # End-to-end tests
│
├── docs/ # Documentation
│ ├── architecture.md # System architecture
│ └── deployment.md # Deployment guide
│
├── TASKS.md # Implementation tasks (100+ tasks)
├── TRANSITION_PROPOSITION.md # Technical proposal
└── valyrion-architecture-clean.mmd # Mermaid diagram
- 3072-dim embeddings (OpenAI text-embedding-3-large)
- HNSW index for fast similarity search
- Metadata filtering (company, date, document type)
- Entities: Companies, Products, Executives, Events
- Relationships: COMPETES_WITH, SUPPLIES_TO, REPORTED_IN
- Cypher queries for multi-hop reasoning
- Full documents with metadata
- Query logs and analytics
- pgvector extension for hybrid queries
- BM25 ranking
- Keyword search and boosting
- Metadata indexing
- Embedding cache (TTL: 30 days)
- Query result cache (TTL: 1 hour)
- Session data
- Raw documents (PDFs, HTMLs)
- Lifecycle policies (archive to Glacier after 90 days)
- Versioning enabled
uvicorn valyrion.api.main:app --reloadvalyrion-agentpython scripts/ingest_sec_filings.pypytest tests/ -v
pytest tests/unit/ -v --cov=valyrion# API
docker build -t valyrion-api -f docker/Dockerfile.api .
# Worker
docker build -t valyrion-worker -f docker/Dockerfile.worker .cd terraform/environments/dev
terraform init
terraform plan -out=tfplan
# terraform apply tfplan # Only when readyKey variables in .env:
OPENAI_API_KEY: OpenAI API keyFINNHUB_API_KEY: Finnhub API keyPOSTGRES_HOST: PostgreSQL hostREDIS_HOST: Redis hostQDRANT_HOST: Qdrant hostNEO4J_URI: Neo4j connection URIOPENSEARCH_HOST: OpenSearch/Elasticsearch hostS3_BUCKET_NAME: S3 bucket for documents
Key variables in terraform/environments/dev/terraform.tfvars:
environment: Environment name (dev, prod)vpc_cidr: VPC CIDR blockapi_instance_count: Number of API serverspostgres_instance_class: RDS instance typeenable_multi_az: Enable Multi-AZ for databases
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"query": "What was Apple'\''s revenue in Q4 2023?",
"filters": {
"company": "AAPL",
"date_from": "2023-10-01",
"date_to": "2023-12-31"
}
}'{
"answer": "Apple's Q4 2023 revenue was $119.6B...",
"sources": [
{
"document_id": "doc_123",
"document_type": "10-K",
"company": "AAPL",
"date": "2023-11-02",
"excerpt": "Revenue for Q4 2023...",
"score": 0.95
}
],
"confidence": 0.92,
"latency_ms": 1234
}- Query Latency: <2s (P95)
- Answer Correctness: >85%
- Retrieval Recall@10: >90%
- Cost per Query: <$0.05
- System Uptime: 99.5%
Monthly Production Costs (~1M documents, 10K queries/day):
- AWS Infrastructure: $600-800
- ECS: ~$200
- RDS PostgreSQL: ~$150
- ElastiCache Redis: ~$100
- OpenSearch: ~$150
- EC2 (Qdrant, Neo4j): ~$150
- S3 + misc: ~$50
- OpenAI API: $400-600
- Embeddings: ~$100
- LLM calls: ~$300-500
Total: ~$1,000-1,400/month
Cost per query: ~$0.04
See TASKS.md for the complete implementation roadmap:
- Phase 1: Core Infrastructure (Weeks 1-2)
- Phase 2: RAG Pipeline (Weeks 3-4)
- Phase 3: Agent Integration (Weeks 5-6)
- Phase 4: Deployment (Weeks 7-8)
- Phase 5: Testing (Week 8)
- Phase 6: Scale & Production (Weeks 9-10)
- Phase 7: Documentation (Week 11)
Total Duration: 10-12 weeks
- CloudWatch Dashboards: API metrics, database performance, costs
- CloudWatch Alarms: Latency, error rate, resource utilization
- Prometheus Metrics: Application-level metrics
- AWS X-Ray: Distributed tracing (optional)
- Authentication: API key-based
- Encryption: TLS 1.2+ in transit, AES-256 at rest
- IAM: Least-privilege roles
- Secrets Management: AWS Secrets Manager
- WAF: Rate limiting, attack protection
- VPC: Private subnets for databases
- Create feature branch
- Make changes
- Run tests:
pytest tests/ -v - Run linters:
black . && isort . && flake8 - Submit pull request
[Your License Here]
For issues and questions:
- GitHub Issues: [Your Repo URL]
- Documentation:
docs/ - Email: [Your Email]
- OpenAI for GPT-4 and embeddings
- Qdrant for vector database
- Neo4j for knowledge graph
- LangChain for agent framework
- Unstructured.io for document parsing
Built with 🐉 by MOUAD AYOUB