I am an AI Engineer based in Dubai. I build LLM powered products and take them from the first architecture decision through to a running production system.
- ๐ค I work as a Generative AI Engineer at Gama Security Systems, where I build RAG systems, agent workflows, and copilot features.
- ๐๏ธ I handle the full delivery of AI features. That covers retrieval architecture, fine tuning, evaluation pipelines, and keeping cost and latency under control once things are live.
- โก I am comfortable starting work before the requirements are fully clear. I pick the tools, make a call, and ship something that works.
- ๐งฌ I recently built DeepSeek V2 Lite, a 15.7B parameter Mixture of Experts model, from scratch in PyTorch.
- ๐ฑ I follow new work in LLMs, AI agents, and inference optimization closely.
I built DeepSeek V2 Lite from scratch in PyTorch. It is a 15.7B parameter Mixture of Experts model that keeps 2.4B parameters active per token.
- ๐ฌ I implemented Multi-head Latent Attention, which cuts the KV cache by 86%. The model runs a 64 expert MoE with top 6 routing and RoPE positional encoding with YaRN scaling. I checked every layer against the HuggingFace reference to confirm the numbers matched.
- ๐งฎ I applied INT8 quantization to bring inference memory from 31GB down to around 16GB, so the model runs on consumer hardware. The critical layers stay in full precision to protect output quality.
- ๐ I built an OpenAI compatible streaming inference server with FastAPI and SSE. Redis Streams sits behind it as a token bus, so clients can reconnect and replay from their last position, and the server scales without sticky sessions.
- โ๏ธ I deployed it as three services on AWS. Separating the code containers from the model weights, then using SageMaker Async Inference with scale to zero, brought the monthly hosting cost from around 730 dollars down to around 12 dollars.
PyTorch FastAPI LangGraph AWS SageMaker Docker Terraform Redis Streams
A personal assistant that uses hybrid retrieval and the Model Context Protocol, and handles multi step reasoning on its own.
- ๐ฏ I built a hybrid RAG pipeline that combines dense embeddings, sparse BM25, and a cross encoder for reranking. Retrieval accuracy went from 0% to 100% on a curated evaluation set.
- ๐ง The assistant runs a ReAct agent on LangGraph. It picks from more than 16 tools through the Model Context Protocol and reasons across several steps.
- ๐ Every swappable part sits behind an abstract interface. I can A/B test different RAG setups by changing environment variables, with no code changes.
- ๐ ๏ธ It runs on AWS with EC2 and ECR, Docker for containers, Nginx for SSL, and a GitHub Actions pipeline that tests and deploys on every merge.
LangGraph OpenAI Qdrant MCP Docker AWS Nginx
| Role | Company | Period |
|---|---|---|
| ๐ค Generative AI Engineer | Gama Security Systems, Dubai ๐ฆ๐ช | May 2023 to Present |
| ๐ป Full Stack Developer | Allianz Technology, India ๐ฎ๐ณ | Apr 2021 to Apr 2023 |
| ๐ ๏ธ Junior Software Engineer | Infinite Open Source Solutions, India ๐ฎ๐ณ | Jun 2019 to Apr 2021 |
- ๐ M.Sc. in Computer Science, Chandigarh University, India.
2023 to 2025 - ๐ Supervised Machine Learning: Regression and Classification, from Stanford University.
- ๐ DevOps Beginners to Advanced, from Udemy.
- ๐ Microsoft .NET Fundamentals, from Microsoft.
- ๐ Programming with Python, from Harvard University.
I am open to interesting AI and ML problems, and I am happy to talk about building with LLMs.
If you are hiring for AI engineering work, or you just want to compare notes, get in touch.