Skip to content

Scott-Mabe/evals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

Open source LLM evaluation tools

Promptfoo — CLI for prompt testing, red-teaming, and model comparison

DeepEval — Pytest-style framework with 50+ LLM metrics for RAG

Ragas — RAG evaluation with auto test dataset generation

OpenAI Evals — Benchmark registry with automated and human grading

Comet Opik — Tracing, evaluation, and monitoring for LLM agents

Open source Sandboxing tools

claude-vm - Claude VM Run Claude within a VM

nono - Secure, kernel-enforced sandbox CLI and SDKs for AI agents.

Docker - Containerization tool

Open source CLI Proxy

RTK - CLI proxy that reduces LLM token consumption

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors