Harshada Javeri harshada-javeri

Harshada Javeri

Applied AI Engineer • AI Reliability • Agent Systems • ML Platform Engineering

Building production-grade AI systems that are scalable, observable, and trustworthy.

About Me

I am a Senior Applied AI Engineer with 7 years of experience spanning machine learning, software engineering, intelligent automation, and AI platform development.

My work focuses on building and evaluating production AI systems, with particular interest in:

Agentic AI Systems
LLM Evaluation & Benchmarking
AI Reliability Engineering
Production ML Platforms
AI Governance & Safety
Model Observability
Enterprise AI Deployment

I enjoy operating at the intersection of:

AI Research × Engineering × Product × Deployment

Current Focus

Agentic AI & LLM Systems

Multi-agent orchestration
RAG architectures
Tool-using AI agents
LangGraph & CrewAI systems
Agent evaluation frameworks
Behavioral consistency testing

AI Reliability Engineering

Model validation pipelines
Drift detection
Inference monitoring
Reproducibility testing
Deployment quality gates
AI observability

Applied AI Platforms

Production ML workflows
Evaluation infrastructure
Enterprise AI systems
MLOps and CI/CD
Compliance-aware AI systems
Governance and auditability

Technical Expertise

AI & Agent Systems

Python • LLMs • RAG • LangGraph • LangChain • CrewAI • Multi-Agent Systems • AI Evaluation • LLM Benchmarking • AI Safety • AI Governance

Machine Learning

Scikit-Learn • NLP • Classification • Anomaly Detection • Feature Engineering • Explainable AI • Model Monitoring

MLOps & Infrastructure

MLflow • Docker • Kubernetes • GitHub Actions • Jenkins • AWS • CI/CD • Experiment Tracking

Observability

Prometheus • Grafana • Drift Detection • Inference Monitoring • Reliability Metrics

Data & Backend

SQL • PostgreSQL • Snowflake • IBM DB2 • REST APIs

Selected Projects

Multi-Agent LLM Evaluation System

Built an evaluation and observability platform for agentic systems that:

Tracks agent behavior across runs
Detects failure patterns and regressions
Measures consistency and reliability
Supports large-scale benchmarking

Tech: Python, LangGraph, CrewAI, OpenAI APIs, AgentOps, MLflow

Pi-Bench: Policy Intelligence Benchmark

Designed a benchmarking framework for evaluating policy adherence in agentic AI systems.

Capabilities include:

Tool-call validation
Escalation verification
Safety policy enforcement
Deterministic evaluation workflows
Compliance-focused testing

ML Reliability & Observability Platform

Built validation pipelines and monitoring systems for production ML models.

Focus areas:

Drift detection
Latency monitoring
Reproducibility checks
Deployment validation
Automated quality gates

Professional Interests

Applied AI Engineering
Forward Deployed AI
Agent Infrastructure
AI Reliability Engineering
AI Safety & Governance
Evaluation Systems
Production LLM Applications
Human-AI Collaboration

Philosophy

The next generation of AI systems will not be won by larger models alone.

They will be won by teams that can build systems that are:

Reliable
Observable
Auditable
Safe
Useful in production

I enjoy building the infrastructure and evaluation systems that make this possible.

Connect

📧 harshada.javeri@gmail.com

💼 LinkedIn: linkedin.com/in/harshada-javeri-mle

💻 GitHub: github.com/harshada-javeri

Open to discussions around:

Applied AI • Forward Deployed Engineering • Agent Systems • AI Infrastructure • LLM Evaluation • AI Reliability

Provide feedback

Saved searches

Use saved searches to filter your results more quickly