🚀 Optimize preferences effectively with ORPO, a framework for monolithic preference optimization without a reference model.
-
Updated
Dec 17, 2025 - Python
🚀 Optimize preferences effectively with ORPO, a framework for monolithic preference optimization without a reference model.
🛠️ Build and explore a minimal implementation of recursive language models with a REPL environment for OpenAI clients. Start hacking today!
🌍 Explore global health trends through data science with clustering models and statistical validation from the 2021 Global Burden of Disease Study.
🤖 Explore reinforcement learning techniques with projects including a taxi agent using Q-Learning and a DQN-based Space Invaders agent.
🔍 Enhance iterative theorem proving with DSPy by comparing full oracle vs. clipped hints using a mock Lean verifier in this streamlined setup.
Training of Drone Swarms using StableBaselines3, PettingZoo, AirSim and UE4
Safety challenges for AI agents' ability to learn and act in desired ways in relation to biologically and economically relevant aspects. The benchmarks are implemented in a gridworld-based environment. The environments are relatively simple, just as much complexity is added as is necessary to illustrate the relevant safety and performance aspects.
Partially Observable Multi-Agent RL with Transformers
The project uses Webots and Reinforcement Learning to train the Toyota Prius vehicle to follow the road line and avoid obstacles. Second Semester of the Third Year of the Bachelor's Degree in Artificial Intelligence and Data Science.
XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library
An attempt at implementing a Deep Reinforcement Learning package
Vision-based autonomous racing system comparing PPO, DQN, and GAIL with custom reward shaping across CarRacing-v3 and TORCS simulators
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Deep Reinforcement Learning for mobile robot navigation in IR-SIM simulation. Using DRL (SAC, TD3, PPO, DDPG) neural networks, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
RL for Type-1 Diabetes Control
Safe reinforcement learning for quadruped locomotion using Control Barrier Functions (CBF) - Zero falls, 99% safety rate, 90% speed retention with provable safety guarantees
Code for Does Optimism Help PPO? Optimistic Gradient Updates for Multi-Agent Games and Exploration Benchmarks (OPPO Optimistic PPO)
Add a description, image, and links to the ppo topic page so that developers can more easily learn about it.
To associate your repository with the ppo topic, visit your repo's landing page and select "manage topics."