Skip to content
View imrohankataria's full-sized avatar

Block or report imrohankataria

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
imrohankataria/README.md

Hello World!

Instagram Twitter LinkedIn HuggingFace



👋 Hi, I'm Rohan Kataria

AI Infra Engineer · LLM/Agentic Systems · Cost Optimization · AKS/Terraform · NVIDIA + OSS AI

I build efficient AI infrastructure — from optimized GPU clusters to fast LLM serving (vLLM, Triton, SGLang), agentic workflows (LangGraph/CrewAI), and cost-aware pipelines.

🔥 What I'm focusing on right now

A 14-day AI Infra portfolio showcasing:

  • GPU cost savings (Spot+OD, autoscaling, DCGM dashboards)
  • LLM serving benchmarks: Triton vs vLLM vs TGI vs SGLang
  • Quantization + speculative decoding
  • Long-context efficiency (128k–1M tokens)
  • RAG cost optimization
  • Multi-agent orchestration cost tracing
  • CICD for AI systems (GitHub Actions → AKS)

🌐 Find Me Online

🌍 rohankataria.com
🔗 linkedin.com/in/imrohan
🤗 huggingface.co/thewise
📸 instagram.com/byrohankataria

Pinned Loading

  1. agentic-multi-llm-orchestration-bench agentic-multi-llm-orchestration-bench Public archive

  2. budget-finetune-pipeline-qlora-spectrum budget-finetune-pipeline-qlora-spectrum Public archive

  3. cost-gpu-aks-terraform cost-gpu-aks-terraform Public archive

  4. gpu-utilization-observability-stack gpu-utilization-observability-stack Public archive

  5. long-context-efficiency-kv-experiments long-context-efficiency-kv-experiments Public archive

  6. mlops-zero-downtime-ai-delivery mlops-zero-downtime-ai-delivery Public archive