An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,314 807 Updated Oct 31, 2025

hkust-nlp / dart-math

[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*

Jupyter Notebook 116 6 Updated Dec 10, 2024

BytedTsinghua-SIA / DAPO

An Open-source RL System from ByteDance Seed and Tsinghua AIR

Python 1,616 70 Updated May 11, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,120 2,424 Updated Nov 5, 2025

RecFlow-ICLR / RecFlow

Python 61 1 Updated Oct 2, 2024

HumanMLLM / R1-Omni

Python 965 65 Updated Mar 24, 2025

zhenye234 / LLaSA_training

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 625 50 Updated Apr 8, 2025

LLaVA-VL / LLaVA-NeXT

Python 4,368 414 Updated Sep 14, 2025

llyx97 / TempCompass

[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou

Python 125 4 Updated Apr 4, 2025

mbzuai-oryx / VideoGLaMM

[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Python 89 3 Updated Apr 14, 2025

lllyasviel / FramePack

Lets make video diffusion practical!

Python 16,080 1,545 Updated Oct 16, 2025

tulerfeng / Video-R1

Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]

Python 728 38 Updated Sep 19, 2025

StarsfieldAI / R1-V

Witness the aha moment of VLM with less than $3.

Python 3,975 290 Updated May 19, 2025

alibaba / alimama-video-narrator

Research code for ACL2024 paper: "Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline"

Python 39 5 Updated Dec 27, 2024

krillinai / KrillinAI

Video translation and dubbing tool powered by LLMs. The video translator offers 100 language translations and one-click full-process deployment. The video translation output is optimized for platfo…

Go 8,841 723 Updated Nov 5, 2025

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,765 295 Updated Jun 12, 2025

DAMO-NLP-SG / VideoLLaMA3

Frontier Multimodal Foundation Models for Image and Video Understanding

Jupyter Notebook 1,030 74 Updated Aug 14, 2025

shikras / shikra

Python 797 46 Updated Jul 8, 2024

NafisSadeq / rolefact

Jupyter Notebook 6 Updated Oct 3, 2024

Keqi Chen ckqqqq

Highlights

Lists (4)

Canada

IELTS

Research2023

Technology

Stars