Skip to content
View guijinSON's full-sized avatar

Block or report guijinSON

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 1 Updated May 25, 2026

The FATE (Formal Algebra Theorem Evaluation) benchmarks.

54 3 Updated Feb 23, 2026

A benchmark for evaluating AI agents on frontier ultra long-horizon auto research tasks.

Python 142 15 Updated Jun 17, 2026

slime is an LLM post-training framework for RL Scaling.

Python 6,210 905 Updated Jun 18, 2026

CCXT for prediction markets. PMXT is a unified API for trading on Polymarket, Kalshi, and more.

TypeScript 1,899 225 Updated Jun 18, 2026

A light-weight tool for evaluating LLMs in rule-based ways.

Python 87 11 Updated Jun 19, 2025

🤗 Benchmark Large Language Models Reliably On Your Data

HTML 448 41 Updated Apr 2, 2026

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …

Python 14,558 1,483 Updated Jun 18, 2026

Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"

12 Updated Mar 25, 2025

nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)

Python 144 9 Updated May 8, 2025

The most modern LLM evaluation toolkit

Python 70 11 Updated Apr 30, 2026

A hackable, simple, and reseach-friendly GRPO Training Framework with high speed weight synchronization in a multinode environment.

Python 38 4 Updated Aug 27, 2025

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

Python 661 63 Updated Jan 29, 2026

Official implementation for "MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models"

Jupyter Notebook 20 4 Updated Oct 26, 2024

ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ

Python 458 135 Updated May 2, 2025

Performs benchmarking on two Korean datasets with minimal time and effort.

Python 45 8 Updated Jan 22, 2026

🤏🏻 `investpy` but made tiny

Python 422 43 Updated Feb 28, 2026

An Open Source Toolkit For LLM Distillation

Python 968 128 Updated May 12, 2026

Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.

Jupyter Notebook 31 4 Updated Apr 14, 2026

DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)

Jupyter Notebook 81 13 Updated Oct 3, 2024

Evaluate your LLM's response with Prometheus and GPT4 💯

Python 1,092 71 Updated Apr 25, 2025

Codebase for Merging Language Models (ICML 2024)

Python 868 52 Updated May 5, 2024
Python 8 2 Updated Aug 16, 2024
Jupyter Notebook 2 2 Updated Mar 25, 2024

Corpus of Annual Reports in Japan

Python 94 7 Updated Dec 19, 2020

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Python 767 58 Updated Feb 1, 2024
Jupyter Notebook 3 Updated Jan 31, 2024

Tools for merging pretrained large language models.

Python 7,162 743 Updated Jun 17, 2026

Korean Port for teknium1/LLM-Logbook

HTML 6 Updated Oct 31, 2023
Next