Building production-grade AI systems that are scalable, observable, and trustworthy.
I am a Senior Applied AI Engineer with 7 years of experience spanning machine learning, software engineering, intelligent automation, and AI platform development.
My work focuses on building and evaluating production AI systems, with particular interest in:
- Agentic AI Systems
- LLM Evaluation & Benchmarking
- AI Reliability Engineering
- Production ML Platforms
- AI Governance & Safety
- Model Observability
- Enterprise AI Deployment
I enjoy operating at the intersection of:
AI Research × Engineering × Product × Deployment
- Multi-agent orchestration
- RAG architectures
- Tool-using AI agents
- LangGraph & CrewAI systems
- Agent evaluation frameworks
- Behavioral consistency testing
- Model validation pipelines
- Drift detection
- Inference monitoring
- Reproducibility testing
- Deployment quality gates
- AI observability
- Production ML workflows
- Evaluation infrastructure
- Enterprise AI systems
- MLOps and CI/CD
- Compliance-aware AI systems
- Governance and auditability
Python • LLMs • RAG • LangGraph • LangChain • CrewAI • Multi-Agent Systems • AI Evaluation • LLM Benchmarking • AI Safety • AI Governance
Scikit-Learn • NLP • Classification • Anomaly Detection • Feature Engineering • Explainable AI • Model Monitoring
MLflow • Docker • Kubernetes • GitHub Actions • Jenkins • AWS • CI/CD • Experiment Tracking
Prometheus • Grafana • Drift Detection • Inference Monitoring • Reliability Metrics
SQL • PostgreSQL • Snowflake • IBM DB2 • REST APIs
Built an evaluation and observability platform for agentic systems that:
- Tracks agent behavior across runs
- Detects failure patterns and regressions
- Measures consistency and reliability
- Supports large-scale benchmarking
Tech: Python, LangGraph, CrewAI, OpenAI APIs, AgentOps, MLflow
Designed a benchmarking framework for evaluating policy adherence in agentic AI systems.
Capabilities include:
- Tool-call validation
- Escalation verification
- Safety policy enforcement
- Deterministic evaluation workflows
- Compliance-focused testing
Built validation pipelines and monitoring systems for production ML models.
Focus areas:
- Drift detection
- Latency monitoring
- Reproducibility checks
- Deployment validation
- Automated quality gates
- Applied AI Engineering
- Forward Deployed AI
- Agent Infrastructure
- AI Reliability Engineering
- AI Safety & Governance
- Evaluation Systems
- Production LLM Applications
- Human-AI Collaboration
The next generation of AI systems will not be won by larger models alone.
They will be won by teams that can build systems that are:
- Reliable
- Observable
- Auditable
- Safe
- Useful in production
I enjoy building the infrastructure and evaluation systems that make this possible.
💼 LinkedIn: linkedin.com/in/harshada-javeri-mle
💻 GitHub: github.com/harshada-javeri
Open to discussions around:
Applied AI • Forward Deployed Engineering • Agent Systems • AI Infrastructure • LLM Evaluation • AI Reliability