Skip to content
View ccdgyro's full-sized avatar

Block or report ccdgyro

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Minimal reproduction of DeepSeek R1-Zero

Python 12,352 1,523 Updated Apr 24, 2025

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Python 2,961 551 Updated Apr 15, 2024

Train transformer language models with reinforcement learning.

Python 16,157 2,272 Updated Nov 5, 2025

📚 从零开始的智能体原理与实践教程

Python 2,691 278 Updated Nov 4, 2025

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 4,667 440 Updated Nov 4, 2025

Repository for research works and resources related to model reprogramming <https://arxiv.org/abs/2202.10629>

62 1 Updated Sep 17, 2025

AgentFlow: In-the-Flow Agentic System Optimization

Python 1,148 136 Updated Nov 5, 2025

Code and dataset for paper: DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping

19 Updated Nov 5, 2025

[NeurIPS 2025 Spotlight] "Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection"

Python 11 1 Updated Oct 6, 2025

Official implementation of Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation

Python 76 2 Updated Oct 29, 2025

[ICLR2021 Oral] Free Lunch for Few-Shot Learning: Distribution Calibration

Python 475 71 Updated Nov 19, 2021

Official code for "Vision Transformers with Self-Distilled Registers" (NeurIPS 2025 Spotlight)

Jupyter Notebook 10 Updated Oct 19, 2025

source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"

Python 60 5 Updated Apr 11, 2025

Learning Deep Representations of Data Distributions

TeX 579 45 Updated Oct 29, 2025

Qwen Code is a coding agent that lives in the digital world.

TypeScript 15,060 1,239 Updated Nov 5, 2025

从无名小卒到大模型(LLM)大英雄~ 欢迎关注后续!!!

Jupyter Notebook 1,794 124 Updated Oct 19, 2025

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

MATLAB 12,694 1,204 Updated Oct 28, 2025

🦜🔗 The platform for reliable agents.

Python 118,922 19,586 Updated Nov 5, 2025

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 5,919 322 Updated Sep 30, 2025

scikit-learn cross validators for iterative stratification of multilabel data

Python 880 74 Updated Oct 12, 2024

Pytorch implementation of Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels

Python 24 2 Updated Jan 3, 2024

Empirical tricks for training robust models (ICLR 2021)

Python 257 27 Updated May 25, 2023

Home of StarCoder2!

Python 1,982 191 Updated Mar 21, 2024

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 35,664 5,058 Updated Nov 5, 2025

This may be the simplest implement of DDPM. You can directly run Main.py to train the UNet on CIFAR-10 dataset and see the amazing process of denoising.

Python 2,054 216 Updated Apr 24, 2023

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 39,213 4,766 Updated Jun 2, 2025

ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement learning.

Python 43 2 Updated Aug 6, 2025

[arXiv:2508.00410] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"

Python 29 4 Updated Oct 6, 2025

[arXiv:2508.00410] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"

Python 42 1 Updated Oct 6, 2025
Next