I architect and ship end-to-end AI systems - from RAG pipelines and multi-agent orchestration to production backends and infrastructure.
My focus is on building things that work reliably at scale: retrieval systems, LLM integrations, intelligent assistants, backend machinery, and production deployment.
- LLM Engineering - designing and building multi-agent systems, orchestrating tool calls, and optimizing context for production workloads.
- RAG & Retrieval - architecting end-to-end search pipelines with hybrid retrieval, reranking, and vector DB optimization.
- Backend Engineering - building async Python backends with clean architecture, auth, and integrations.
- Bots & Automation - shipping production-grade AI assistants for Telegram and WhatsApp with multilingual support and knowledge bases.
- DevOps - containerizing, deploying, and monitoring services with CI/CD and observability.
- Building enterprise-level AI assistants with RAG, memory, and multi-step reasoning.
- Architecture of modular toolstores, agent routers, prompt stores, and model abstractions.
- Designing LLM-first backends with clean boundaries and predictable tool execution.
- Adaptive pipelines for production: ingestion, validation, observability, evaluation.
- Primary: Python 3.12 (async, typing, Pydantic-first)
- Data & Queries: SQL, JSON, YAML, TOML
- Web & Scripting: JavaScript / TypeScript, HTML, CSS
- Systems & Automation: Bash, PowerShell, Dockerfile, Makefile
- Exploring: Go, Rust, C, C++ (reading-level)
- OpenAI API, Anthropic API, Google Gemini, Grok, DeepSeek
- Open-source LLMs
- Ollama - local model deployment
- Embedding models: text-embedding-3-large, SentenceTransformers, multilingual
- LangChain, LangGraph
- Qdrant, ChromaDB, FAISS
- HuggingFace, tiktoken
- Cross-encoder reranking (BAAI/bge-reranker-v2-m3, Jina AI, .etc)
- Langfuse - LLM observability
- Prompt engineering (structured output, function calling, tool calling)
- Multi-provider architecture (OpenAI, Anthropic, Gemini, DeepSeek, Groq)
- SSE Streaming - real-time response streaming
- Document parsing: PDF (pypdf), DOCX (python-docx), TXT, JSON, .etc
- FastAPI, Uvicorn (ASGI)
- Pydantic v2 - data validation, settings management
- Starlette - middleware (rate limiting, security)
- AsyncIO, SQLAlchemy 2.0
- Alembic (migrations)
- PostgreSQL, psycopg 3 + psycopg-pool
- MongoDB
- Redis
- httpx, aiohttp - async HTTP clients
- Multiprocessing & multithreading
- JWT auth (python-jose) - access + refresh tokens
- OAuth 2.0 (Google)
- Webhook systems (Telegram, WhatsApp Cloud API)
- OpenAPI / Swagger UI / ReDoc
- aiogram - Telegram Bot API framework
- aiogram FSM - finite state machines for multi-step dialogs
- RAG (Retrieval-Augmented Generation), Hierarchical RAG
- Hybrid Search - Dense + Sparse (BM25)
- PostgreSQL Full-Text Search (tsvector, GIN indexes)
- Reciprocal Rank Fusion (RRF)
- Cross-Encoder Reranking
- Apache Bench, JMeter - load testing
- pytest, pytest-cov, pytest-asyncio, pytest-mock
- FastAPI TestClient / httpx
- Docker / Compose (dev, prod configs)
- GitHub Actions
- Ubuntu Server, systemd
- Health Checks (/health, /ready, /metrics)
- Structured logging (JSON/text, rotation)
- Custom metrics (latency, cache hit/miss, error rates)
- Psutil - CPU/RAM monitoring
- SMTP/webhook alerts
- Prompt Injection Protection - custom multilingual guardrails (regex filters, input sanitization, context boundary enforcement)
- LLM output validation, content filtering, safe tool execution
- Rate Limiting (per-user/IP via Redis)
- Retry with Exponential Backoff
- Ruff - linter & formatter
- Flowise - no-code LLM orchestration
- Chroma Cloud
- n8n
Feel free to reach out:
- Email: skvozdymperegara@proton.me
"AI systems are built, not summoned. Good engineering beats magic every time."