Skip to content

infrastructure · devops · ai engineering consulting

Your infra is on fire. I put it out, then stop it from reigniting.

Kubernetes firefighting, gateways, cloud cost cutting, and AI engineering, from an SRE who has broken plenty of production and put it back together. Broken Kubernetes clusters, gateways that won’t route, runaway cloud bills, AI features stuck one step short of production. That’s the whole job.

  • Toptal Top 3% talent
  • Infra engineer @ TrueFoundry
  • 10+ years shipping & breaking production
  • Author of the Lazy SRE security series

the stack I live in

KubernetesAWSGCPAzureTerraformOpenTofuHelmArgo CDFluxEnvoyKongNginxTraefikIstioCiliumTailscaleCloudflareWireGuardPrometheusGrafanaOpenTelemetryKarpenterVaultHarnessGitHub ActionsDocker
LangGraphLangChainLangFuseLlamaIndexvLLMOllamaHugging FaceTritonRayOpenAIAnthropic ClaudeMCPRAGpgvectorPineconeWeaviateQdrantBedrockSageMakerTemporaln8n

services

What I help with

Infrastructure Debugging & Incident Response

Production is down, a pod won't start, or nobody knows why latency tripled. I debug it to root cause and get you back up.

learn more

API Gateways & Networking

Ingress that won't route, an API gateway nobody understands, or a VPN appliance you want gone. I make traffic flow the way it should.

learn more

DevOps & Platform Engineering

Kubernetes set up properly, infrastructure in code, and CI/CD that deploys without drama. The platform your team wishes they already had.

learn more

Cloud Cost Optimization (FinOps)

Your AWS or GCP bill keeps climbing and nobody knows exactly why. I find the waste and cut it, and your system stays just as fast and reliable as before.

learn more

Security & SRE Hardening (DevSecOps)

Shrink your attack surface and stop the 3am pages. Security and reliability hardening that respects your team's time.

learn more

AI Engineering & AI Agency

You want to ship an AI feature, a RAG assistant, an agent, a self-hosted model, but the gap between a demo and production is wide. I close it.

learn more

RAG Systems & AI Chatbots

You want a chatbot over your docs that gives correct answers with citations, but the prototype makes things up and nobody trusts it. I fix that.

learn more

AI Agents & Workflow Automation

Your agent works 70% of the time, which is a great demo and a terrible product. I get it to a reliability number you'd actually put in front of users.

learn more

LLM & AI Cost Optimization

Your OpenAI or Anthropic bill is climbing and a lot of it is waste. I find where the money goes and cut it without wrecking quality.

learn more

selected work

Problems I’ve already solved

how it works

How an engagement runs

01

Reach out

Tell me what’s broken, or what you’re trying to build. A short message or a 30-minute call works.

02

Diagnose

I find the root cause, or scope the build, and write up a plan in plain language.

03

Fix & hand off

I do the work, narrate as I go, and leave your team with docs, runbooks, and clear ownership.

Questions before you reach out

What do you actually do?+

I’m an independent infrastructure, DevOps, and AI engineering consultant. I debug broken Kubernetes and cloud setups, design gateways and networking, cut cloud bills, harden security, and ship production RAG/LLM systems.

Do you do one-off fixes or longer engagements?+

Both. I take emergency incident calls, short scoped projects like a cost-cutting pass or a gateway setup, and longer advisory or build work. We pick whatever fits the problem.

How do I start?+

Book a free 30-minute call, or send a message with your stack, the symptoms, and your timeline. You’ll leave the first call knowing what’s wrong and how I’d fix it.

Who am I working with?+

Harshit Luthra, a Toptal Top 3% infrastructure engineer currently at TrueFoundry, with 10+ years of breaking and fixing production. You work with me directly.

Got a problem worth a look?

Book a free 30-minute call. We diagnose it together, and you walk away with a plan you can act on. You’ll get a straight read either way.