ppo

Fine-tunes FLAN-T5 using Reinforcement Learning (PPO) and PEFT to generate less toxic summaries, leveraging Meta AI's hate speech reward model for detoxification.

ppo generative-ai flan-t5

Updated May 25, 2025
HTML

Fashad-Ahmed / Tablut-challenge

Star

Hybrid Reinforcement Learning and minimax agent for Tablut game. Combines PPO trained value networks with alpha beta search for competitive play.

reinforcement-learning minimax alpha-beta-pruning game-ai game-playing-agent proximal-policy-optimization ppo value-network tablut self-play stable-baselines3

Updated Dec 17, 2025
HTML

tganamur / RL-vs-MPC-Racing

Star

Comparing the performance of MPC based racing and RL based racing

python reinforcement-learning autonomous-driving model-predictive-control ppo f1tenth stable-baselines3

Updated May 27, 2024
HTML

koshachya-myata / Data_Center_Simulation

Star

Data Center Environment and Reinforcement Learning (RL) Control

reinforcement-learning rl dc energyplus datacenter ppo stable-baselines3

Updated Oct 29, 2023
HTML

Improve this page

Add a description, image, and links to the ppo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ppo topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ppo

Here are 7 public repositories matching this topic...

MatheusAlves11 / TCC-AcademiaGo

nedamhs / DiabetesRL

GuickerZ / sistema-poo-eda

Ajairajv / Detoxified-Summaries-with-FLAN-T5-PPO

Fashad-Ahmed / Tablut-challenge

tganamur / RL-vs-MPC-Racing

koshachya-myata / Data_Center_Simulation

Improve this page

Add this topic to your repo