Internal benchmark for altinity-mcp for different LLMs

We need create our internal benchmark with our prompts and successfull credentials, please help me with it
look
https://github.com/confident-ai/deepeval/
https://github.com/confident-ai/deepeval/tree/master/examples/mcp_evaluation