LLM 学习笔记:Transformer 架构、强化学习 (RLHF/DPO/PPO)、分布式训练、推理优化。含完整数学推导与Slides。
-
Updated
Feb 28, 2026 - TeX
LLM 学习笔记:Transformer 架构、强化学习 (RLHF/DPO/PPO)、分布式训练、推理优化。含完整数学推导与Slides。
Deep RL topics presented at FI MUNI
Add a description, image, and links to the dpo topic page so that developers can more easily learn about it.
To associate your repository with the dpo topic, visit your repo's landing page and select "manage topics."