Skip to content

MCP Server

since 0.4.2

The EvalHub MCP (Model Context Protocol) server lets AI agents submit evaluations, monitor jobs, and browse benchmarks through a standardized protocol. It connects any MCP-compatible client — Claude Code, VS Code with GitHub Copilot, Cursor, or custom agents — to an EvalHub instance.

Submit Evaluations

Submit model evaluation jobs with benchmarks or pre-defined collections, configure MLflow experiments, and track progress.

Browse Catalogs

Discover providers, benchmarks, and collections through MCP resources and semantic agent metadata.

Monitor Jobs

Poll job status with per-benchmark progress, cancel running jobs, and compare results across runs.

Guided Workflows

Use built-in prompts for Evaluation-Driven Development (EDD), step-by-step model evaluation, and run comparison.

The MCP server is available as:

  • Standalone binary — single file, no dependencies. Download from GitHub Releases or install via Homebrew (coming soon).
  • Container image — for Kubernetes/OpenShift deployments managed by the TrustyAI Operator.

— - Python SDKpip install "eval-hub-sdk[mcp]" includes the MCP server as a CLI subcommand (evalhub mcp).