momaek

Follow

😍

O Very Good

wentx momaek

😍

O Very Good

Follow

Add a Bio

89 followers · 186 following

Achievements

Achievements

Stars

AI Benchmark Tools

7 repositories

openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 18,500 2,960 Updated Apr 14, 2026

onejune2018 / Awesome-LLM-Eval

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.

637 66 Updated Nov 24, 2025

onejune2018 / Awesome-Medical-Healthcare-Dataset-For-LLM

A curated list of popular Datasets, Models and Papers for LLMs in Medical/Healthcare

328 25 Updated Jun 6, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 12,638 3,280 Updated May 11, 2026

open-compass / VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 4,145 699 Updated May 15, 2026

ShishirPatil / gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python 12,865 1,363 Updated Apr 13, 2026

MoonshotAI / K2-Vendor-Verifier

Verify Precision of all Kimi K2 API Vendor

Python 568 35 Updated Feb 14, 2026