Software Engineer | AI & Cloud Infrastructure | ML Systems | Distributed Systems
I build production-grade machine learning systems focused on real-time inference, scalability, reliability, and cost efficiency.
My work sits at the intersection of backend engineering, distributed systems, and applied AI.
Focused on building systems where machine learning must operate under real production constraints: latency, scale, reliability, and cost.
Work spans inference systems, distributed architectures, retrieval pipelines, orchestration layers, and applied ML systems.
Selected systems aligned with production ML and distributed infrastructure work. More experiments and builds are available in my repositories.
- End-to-end OCR + LayoutLMv3 + retrieval + LLM system for document understanding.
- 10k+ docs/day, 5.8k QPS, 94.3% extraction accuracy.
- Distributed microservices architecture with async pipelines, caching, and observability.
- Multi-tenant GPU inference platform with batching and Redis scheduling.
- 2.3x throughput improvement with p99 <100ms under 5x traffic spikes.
- 32% infrastructure cost reduction via autoscaling + warm pool design.
- Eye-tracking based ML pipeline on raw gaze streams.
- BiLSTM + Attention model achieving 56% accuracy vs 9% baseline.
- Behavioral analysis against computational saliency models.
Languages
Python · Java · C++ · C · JavaScript · TypeScript
Backend
FastAPI · Spring Boot · Node.js · REST APIs · Microservices · Async Systems
Infrastructure
Docker · Kubernetes · Redis · AWS · Linux · Git · CI/CD · Nginx
AI / ML
Transformers · LLMs · NLP · OCR · Computer Vision · Multimodal AI · PyTorch · OpenCV
- Real-time ML inference systems
- Distributed systems and scalability
- MLOps and production observability
- Cost-efficient AI infrastructure
- Retrieval and decision systems
Always building.