linkhut

Sort by:

Order:

11 Feb 26

Testing Agent Skills Systematically with Evals

https://developers.openai.com/blog/eval-skills

by shubxam 5 months ago

Tags:

28 Jan 25

Your dataset is the heart of your LLM eval. To the extent possible, it should closely represent true inputs into your LLM app.

https://www.promptfoo.dev/docs/configuration/datasets/

by ciwchris Jan 2025

Tags:

20 Jan 25

Tutorial: Evaluate an LLM's prompt completions

https://learn.microsoft.com/en-us/dotnet/ai/tutorials/llm-eval

by ciwchris Jan 2025

Tags:

12 Dec 24

Evaluation and monitoring metrics for generative AI

https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-metrics-built-in

by ciwchris Dec 2024

Tags:

11 Dec 24

Task-Specific LLM Evals that Do & Don't Work

https://eugeneyan.com/writing/evals/

by ciwchris Dec 2024

Tags: