MCP Server

since 0.4.2

Summary

The EvalHub MCP (Model Context Protocol) server lets AI agents submit evaluations, monitor jobs, and browse benchmarks through a standardized protocol. It connects any MCP-compatible client — Claude Code, VS Code with GitHub Copilot, Cursor, or custom agents — to an EvalHub instance.

What can the MCP server do?

Submit Evaluations

Submit model evaluation jobs with benchmarks or pre-defined collections, configure MLflow experiments, and track progress.

Browse Catalogs

Discover providers, benchmarks, and collections through MCP resources and semantic agent metadata.

Monitor Jobs

Poll job status with per-benchmark progress, cancel running jobs, and compare results across runs.

Guided Workflows

Use built-in prompts for Evaluation-Driven Development (EDD), step-by-step model evaluation, and run comparison.

Distribution methods

The MCP server is available as:

Standalone binary — single file, no dependencies. Download from GitHub Releases or install via Homebrew (coming soon).
Container image — for Kubernetes/OpenShift deployments managed by the TrustyAI Operator.

— - Python SDK — pip install "eval-hub-sdk[mcp]" includes the MCP server as a CLI subcommand (evalhub mcp).

Get started

Agent Discoverability How agents choose and interpret evaluations

Installation Install on macOS, Linux, or Windows

Claude Code Quick Start Connect to Claude Code in under 5 steps

VS Code Quick Start Connect to VS Code / GitHub Copilot

Reference

Tools discover_providers, submit_evaluation, cancel_job, get_job_status

Agent Skills eval-hub-skills plugin for Claude Code

Resources evalhub:// URI scheme for providers, benchmarks, collections, jobs

Prompts edd_workflow, evaluate_model, compare_runs

Troubleshooting Common issues and how to fix them