Skip to content
View ydup's full-sized avatar
🔥
🔥
  • Meituan
  • Beijing

Block or report ydup

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A series of technical report on Slow Thinking with LLM

Python 764 41 Updated Aug 13, 2025

https://hrl.boyuai.com/

Jupyter Notebook 4,669 812 Updated Nov 22, 2022

Seed-Coder is a family of lightweight open-source code LLMs comprising base, instruct and reasoning models, developed by ByteDance Seed.

749 57 Updated Jun 6, 2025

PPO x Family DRL Tutorial Course(决策智能入门级公开课:8节课帮你盘清算法理论,理顺代码逻辑,玩转决策AI应用实践 )

Python 2,557 212 Updated Mar 13, 2025

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

MATLAB 15,332 1,432 Updated Mar 26, 2026

Must-read Papers on LLM Agents.

2,970 177 Updated Apr 14, 2026

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 90,855 13,957 Updated Apr 11, 2026

🔍 Search-o1: Agentic Search-Enhanced Large Reasoning Models [EMNLP 2025]

Python 1,202 104 Updated Nov 17, 2025

Secrets of RLHF in Large Language Models Part I: PPO

Python 1,421 105 Updated Mar 3, 2024

Latest Advances on System-2 Reasoning

Python 1,347 80 Updated Jun 8, 2025

Exploring Applications of GRPO

Python 252 34 Updated Aug 25, 2025

Fully open reproduction of DeepSeek-R1

Python 25,989 2,415 Updated Apr 2, 2026

Integrate the DeepSeek API into popular software

36,272 4,016 Updated Feb 23, 2026

推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.

Python 2,113 267 Updated Mar 25, 2026

LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)

Python 454 24 Updated Oct 11, 2023

计算广告机制策略相关材料整理(A collection of research and application papers about Strategy in Internet advertising.)

185 22 Updated Feb 18, 2024

The official implementation of Self-Play Fine-Tuning (SPIN)

Python 1,237 105 Updated May 8, 2024

搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)

HTML 4,366 470 Updated Apr 15, 2026

All-in-One: Text Embedding, Retrieval, Reranking and RAG in Transformers

Python 73 12 Updated Aug 10, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Python 27,109 1,976 Updated Jan 9, 2026

AI 工具导航大全,帮你快速筛选免费、实用、高效的网站资源

17,048 1,443 Updated Mar 24, 2026

https://acl2023-retrieval-lm.github.io/

JavaScript 156 15 Updated Oct 18, 2023

Official Code for Stable Cascade

Jupyter Notebook 6,573 520 Updated Jul 25, 2024

Empower Large Language Models (LLM) using Knowledge Graph based Retrieval-Augmented Generation (KG-RAG) for knowledge intensive tasks

Jupyter Notebook 943 111 Updated Nov 9, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,619 490 Updated Aug 7, 2024

支持中英文双语视觉-文本对话的开源可商用多模态模型。

Python 378 32 Updated Sep 23, 2023

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,685 2,760 Updated Aug 12, 2024

✨✨Latest Advances on Multimodal Large Language Models

17,650 1,126 Updated Apr 9, 2026

unified embedding model

Python 876 72 Updated Sep 1, 2023

Official repo for consistency models.

Python 6,476 431 Updated Mar 22, 2024
Next