LLM 学习笔记:Transformer 架构、强化学习 (RLHF/DPO/PPO)、分布式训练、推理优化。含完整数学推导与Slides。
-
Updated
Feb 28, 2026 - TeX
LLM 学习笔记:Transformer 架构、强化学习 (RLHF/DPO/PPO)、分布式训练、推理优化。含完整数学推导与Slides。
Reinforcement Learning agent that plays Briscola, a famous Italian card game
My bachelor thesis in Computer Science, "Hypernetwork-PPO for Continual Reinforcement Learning".
A framework for training and evaluating multi-agent reinforcement learning models for adaptive traffic light control in SUMO.
Reinforcement Learning for Yahtzee: A2C, PPO, REINFORCE
RL agents for the highway environment
Deep RL topics presented at FI MUNI
Add a description, image, and links to the ppo topic page so that developers can more easily learn about it.
To associate your repository with the ppo topic, visit your repo's landing page and select "manage topics."