Skip to content
View ydup's full-sized avatar
🔥
🔥
  • Meituan
  • Beijing

Block or report ydup

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A series of technical report on Slow Thinking with LLM

Python 764 41 Updated Aug 13, 2025

https://hrl.boyuai.com/

Jupyter Notebook 4,647 810 Updated Nov 22, 2022

Seed-Coder is a family of lightweight open-source code LLMs comprising base, instruct and reasoning models, developed by ByteDance Seed.

747 56 Updated Jun 6, 2025

PPO x Family DRL Tutorial Course(决策智能入门级公开课:8节课帮你盘清算法理论,理顺代码逻辑,玩转决策AI应用实践 )

Python 2,554 212 Updated Mar 13, 2025

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

MATLAB 15,179 1,422 Updated Mar 26, 2026

Must-read Papers on LLM Agents.

2,952 175 Updated Mar 12, 2026

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 89,952 13,752 Updated Apr 4, 2026

🔍 Search-o1: Agentic Search-Enhanced Large Reasoning Models [EMNLP 2025]

Python 1,193 104 Updated Nov 17, 2025

Secrets of RLHF in Large Language Models Part I: PPO

Python 1,420 105 Updated Mar 3, 2024

Latest Advances on System-2 Reasoning

Python 1,341 76 Updated Jun 8, 2025

Exploring Applications of GRPO

Python 252 34 Updated Aug 25, 2025

Fully open reproduction of DeepSeek-R1

Python 25,962 2,408 Updated Apr 2, 2026

Integrate the DeepSeek API into popular software

36,150 4,004 Updated Feb 23, 2026

推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.

Python 2,100 266 Updated Mar 25, 2026

LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)

Python 453 24 Updated Oct 11, 2023

计算广告机制策略相关材料整理(A collection of research and application papers about Strategy in Internet advertising.)

185 22 Updated Feb 18, 2024

The official implementation of Self-Play Fine-Tuning (SPIN)

Python 1,235 105 Updated May 8, 2024

搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)

HTML 4,344 469 Updated Apr 4, 2026

All-in-One: Text Embedding, Retrieval, Reranking and RAG in Transformers

Python 73 13 Updated Aug 10, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Python 27,037 1,965 Updated Jan 9, 2026

AI 工具导航大全,帮你快速筛选免费、实用、高效的网站资源

17,039 1,438 Updated Mar 24, 2026

https://acl2023-retrieval-lm.github.io/

JavaScript 156 15 Updated Oct 18, 2023

Official Code for Stable Cascade

Jupyter Notebook 6,575 520 Updated Jul 25, 2024

Empower Large Language Models (LLM) using Knowledge Graph based Retrieval-Augmented Generation (KG-RAG) for knowledge intensive tasks

Jupyter Notebook 941 110 Updated Nov 9, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,605 489 Updated Aug 7, 2024

支持中英文双语视觉-文本对话的开源可商用多模态模型。

Python 378 32 Updated Sep 23, 2023

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,649 2,759 Updated Aug 12, 2024

✨✨Latest Advances on Multimodal Large Language Models

17,566 1,120 Updated Apr 3, 2026

unified embedding model

Python 877 72 Updated Sep 1, 2023

Official repo for consistency models.

Python 6,475 432 Updated Mar 22, 2024
Next