Skip to content
View rsoorajs's full-sized avatar
๐Ÿ 
Working from home
๐Ÿ 
Working from home

Block or report rsoorajs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
rsoorajs/README.md
Typing SVG
Profile Views



๐Ÿง  About Me

I am an AI Engineer based in Dubai. I build LLM powered products and take them from the first architecture decision through to a running production system.

  • ๐Ÿค– I work as a Generative AI Engineer at Gama Security Systems, where I build RAG systems, agent workflows, and copilot features.
  • ๐Ÿ—๏ธ I handle the full delivery of AI features. That covers retrieval architecture, fine tuning, evaluation pipelines, and keeping cost and latency under control once things are live.
  • โšก I am comfortable starting work before the requirements are fully clear. I pick the tools, make a call, and ship something that works.
  • ๐Ÿงฌ I recently built DeepSeek V2 Lite, a 15.7B parameter Mixture of Experts model, from scratch in PyTorch.
  • ๐ŸŒฑ I follow new work in LLMs, AI agents, and inference optimization closely.

๐Ÿ› ๏ธ Tech Stack

๐Ÿค– AI & Machine Learning

โ˜๏ธ MLOps & Cloud

๐Ÿ—„๏ธ Backend & Data

๐ŸŽจ Frontend

๐Ÿš€ Featured Projects

๐Ÿงฌ An LLM System Built for Production: Custom Transformer with MLA and MoE

I built DeepSeek V2 Lite from scratch in PyTorch. It is a 15.7B parameter Mixture of Experts model that keeps 2.4B parameters active per token.

  • ๐Ÿ”ฌ I implemented Multi-head Latent Attention, which cuts the KV cache by 86%. The model runs a 64 expert MoE with top 6 routing and RoPE positional encoding with YaRN scaling. I checked every layer against the HuggingFace reference to confirm the numbers matched.
  • ๐Ÿงฎ I applied INT8 quantization to bring inference memory from 31GB down to around 16GB, so the model runs on consumer hardware. The critical layers stay in full precision to protect output quality.
  • ๐Ÿš€ I built an OpenAI compatible streaming inference server with FastAPI and SSE. Redis Streams sits behind it as a token bus, so clients can reconnect and replay from their last position, and the server scales without sticky sessions.
  • โ˜๏ธ I deployed it as three services on AWS. Separating the code containers from the model weights, then using SageMaker Async Inference with scale to zero, brought the monthly hosting cost from around 730 dollars down to around 12 dollars.

PyTorch FastAPI LangGraph AWS SageMaker Docker Terraform Redis Streams

๐Ÿค A RAG Personal Assistant with Agentic Reasoning

A personal assistant that uses hybrid retrieval and the Model Context Protocol, and handles multi step reasoning on its own.

  • ๐ŸŽฏ I built a hybrid RAG pipeline that combines dense embeddings, sparse BM25, and a cross encoder for reranking. Retrieval accuracy went from 0% to 100% on a curated evaluation set.
  • ๐Ÿง  The assistant runs a ReAct agent on LangGraph. It picks from more than 16 tools through the Model Context Protocol and reasons across several steps.
  • ๐Ÿ”Œ Every swappable part sits behind an abstract interface. I can A/B test different RAG setups by changing environment variables, with no code changes.
  • ๐Ÿ› ๏ธ It runs on AWS with EC2 and ECR, Docker for containers, Nginx for SSL, and a GitHub Actions pipeline that tests and deploys on every merge.

LangGraph OpenAI Qdrant MCP Docker AWS Nginx

๐Ÿ’ผ Experience

Role Company Period
๐Ÿค– Generative AI Engineer Gama Security Systems, Dubai ๐Ÿ‡ฆ๐Ÿ‡ช May 2023 to Present
๐Ÿ’ป Full Stack Developer Allianz Technology, India ๐Ÿ‡ฎ๐Ÿ‡ณ Apr 2021 to Apr 2023
๐Ÿ› ๏ธ Junior Software Engineer Infinite Open Source Solutions, India ๐Ÿ‡ฎ๐Ÿ‡ณ Jun 2019 to Apr 2021

๐ŸŽ“ Education and Certifications

  • ๐ŸŽ“ M.Sc. in Computer Science, Chandigarh University, India. 2023 to 2025
  • ๐Ÿ“œ Supervised Machine Learning: Regression and Classification, from Stanford University.
  • ๐Ÿ“œ DevOps Beginners to Advanced, from Udemy.
  • ๐Ÿ“œ Microsoft .NET Fundamentals, from Microsoft.
  • ๐Ÿ“œ Programming with Python, from Harvard University.

๐Ÿ“Š GitHub Analytics




๐Ÿ Contribution Snake

Snake animation

๐Ÿค Let's Connect

I am open to interesting AI and ML problems, and I am happy to talk about building with LLMs.

If you are hiring for AI engineering work, or you just want to compare notes, get in touch.



Pinned Loading

  1. Java-Calculator Java-Calculator Public

    My First Simple Calculator Using Java

    Java

  2. Web-Designing Web-Designing Public

    My sample Html,Css,Java Script Based Projects !

    HTML 3 1

  3. css-loaders css-loaders Public

    This repo stands for collection of various css loader animations

    CSS

  4. url-shortener-package url-shortener-package Public

    JavaScript

  5. node-urlshortener node-urlshortener Public

    JavaScript 2

  6. library--otp library--otp Public

    JavaScript 1