Submit Evaluations
Submit model evaluation jobs with benchmarks or pre-defined collections, configure MLflow experiments, and track progress.
The EvalHub MCP (Model Context Protocol) server lets AI agents submit evaluations, monitor jobs, and browse benchmarks through a standardized protocol. It connects any MCP-compatible client — Claude Code, VS Code with GitHub Copilot, Cursor, or custom agents — to an EvalHub instance.
Submit Evaluations
Submit model evaluation jobs with benchmarks or pre-defined collections, configure MLflow experiments, and track progress.
Browse Catalogs
Discover providers, benchmarks, and collections through MCP resources and semantic agent metadata.
Monitor Jobs
Poll job status with per-benchmark progress, cancel running jobs, and compare results across runs.
Guided Workflows
Use built-in prompts for Evaluation-Driven Development (EDD), step-by-step model evaluation, and run comparison.
The MCP server is available as:
— - Python SDK — pip install "eval-hub-sdk[mcp]" includes the MCP server as a CLI subcommand (evalhub mcp).