Skip to content
View XuGW-Kevin's full-sized avatar

Highlights

  • Pro

Block or report XuGW-Kevin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Tree Search for LLM Agent Reinforcement Learning

Python 256 23 Updated Sep 29, 2025

Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers

Python 25 2 Updated Mar 1, 2025

metaTextGrad: Automatically optimizing language model optimizers. Published in NeurIPS 2025.

Python 8 2 Updated Nov 5, 2025

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Python 2,208 402 Updated Dec 15, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 12,503 1,530 Updated Apr 24, 2025

MENTOR is a highly efficient visual RL algorithm that excels in both simulation and real-world complex robotic learning tasks.

Python 26 1 Updated Jul 9, 2025

Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"

Python 387 48 Updated Dec 20, 2025

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,413 230 Updated Nov 12, 2025

[Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.

Python 513 20 Updated Nov 5, 2025

Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Python 166 13 Updated Apr 23, 2025

Testing baseline LLMs performance across various models

Python 330 52 Updated Dec 1, 2025

SWE-bench: Can Language Models Resolve Real-world Github Issues?

Python 3,988 716 Updated Dec 18, 2025

A LLM trained only on data from certain time periods to reduce modern bias

Python 812 29 Updated Dec 20, 2025

This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".

Python 76 4 Updated Jul 10, 2025

Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.

452 15 Updated Apr 18, 2024

[NeurIPS 2023] Official code release for the paper: "Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning?"

Python 6 Updated Sep 29, 2024

UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Python 821 25 Updated Nov 25, 2025
Python 37 Updated Jun 12, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,335 3,242 Updated Dec 20, 2025

The MATH Dataset (NeurIPS 2021)

Python 1,273 111 Updated Sep 6, 2025

An open source deep research clone. AI Agent that reasons large amounts of web data extracted with Firecrawl

TypeScript 6,120 742 Updated May 7, 2025

A collection of notebooks/recipes showcasing usecases of open-source models with Together AI.

Jupyter Notebook 1,079 197 Updated Dec 19, 2025

Production-ready platform for agentic workflow development.

TypeScript 122,318 19,020 Updated Dec 21, 2025

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 97,581 11,054 Updated Dec 21, 2025

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation

Python 172 5 Updated Nov 6, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,290 327 Updated Dec 15, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,661 2,860 Updated Dec 21, 2025

A fork to add multimodal model training to open-r1

Python 1,432 70 Updated Feb 8, 2025

Witness the aha moment of VLM with less than $3.

Python 4,009 289 Updated May 19, 2025
Next