Stars
slime is an LLM post-training framework for RL Scaling.
A user-friendly & efficient knowledge distillation framework for LLMs, supporting off-policy, on-policy (OPD), cross-tokenizer, multimodal, and on-policy self-distillation.
Search, understand, reproduce, and improve an idea with ease
A project implementing various agentic RL based on the Slime post-training framework
My learning notes for ML SYS.
Laos_System provides a configurable end-to-end pipeline that converts clinical speech/text notes into structured JSON documents for: admission, surgery and discharge.
The KCORES Agent benchmarking project is designed to evaluate the tool-call capabilities of single-modal/multi-modal models.
This is a scaled agent data synthesis system for tool usage learning similar to kimi k2.
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
This is the official repository of the paper Exploring Superior Function Calls via Reinforcement Learning.
This is the official repository of the paper "BalanceSFT: Improving LLM Function Calling with Balanced Training Signals and Data Hardness"
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)