Lists (3)
Sort Name ascending (A-Z)
Starred repositories
Minimal reproduction of DeepSeek R1-Zero
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Train transformer language models with reinforcement learning.
Democratizing Reinforcement Learning for LLMs
Repository for research works and resources related to model reprogramming <https://arxiv.org/abs/2202.10629>
AgentFlow: In-the-Flow Agentic System Optimization
Code and dataset for paper: DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping
[NeurIPS 2025 Spotlight] "Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection"
Official implementation of Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation
[ICLR2021 Oral] Free Lunch for Few-Shot Learning: Distribution Calibration
Official code for "Vision Transformers with Self-Distilled Registers" (NeurIPS 2025 Spotlight)
source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"
Learning Deep Representations of Data Distributions
Qwen Code is a coding agent that lives in the digital world.
从无名小卒到大模型(LLM)大英雄~ 欢迎关注后续!!!
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
🦜🔗 The platform for reliable agents.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
scikit-learn cross validators for iterative stratification of multilabel data
Pytorch implementation of Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels
Empirical tricks for training robust models (ICLR 2021)
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
This may be the simplest implement of DDPM. You can directly run Main.py to train the UNet on CIFAR-10 dataset and see the amazing process of denoising.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement learning.
[arXiv:2508.00410] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"
tmlr-group / Co-rewarding
Forked from resistzzz/Co-rewarding[arXiv:2508.00410] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"