Stars
huggingface / yourbench
Forked from sumukshashidhar/yourbench🤗 Benchmark Large Language Models Reliably On Your Data
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"
nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)
The most modern LLM evaluation toolkit
A hackable, simple, and reseach-friendly GRPO Training Framework with high speed weight synchronization in a multinode environment.
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
Official implementation for "MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models"
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Performs benchmarking on two Korean datasets with minimal time and effort.
Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)
Evaluate your LLM's response with Prometheus and GPT4 💯
Codebase for Merging Language Models (ICML 2024)
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Tools for merging pretrained large language models.
guijinSON / KoLLM-LogBook
Forked from teknium1/LLM-LogbookKorean Port for teknium1/LLM-Logbook
self-instruct unseen data eval in Korean
Robust recipes to align language models with human and AI preferences
Korean Multi-task Instruction Tuning