🚀 Optimize preferences effectively with ORPO, a framework for monolithic preference optimization without a reference model.
-
Updated
Dec 18, 2025 - Python
🚀 Optimize preferences effectively with ORPO, a framework for monolithic preference optimization without a reference model.
🛠️ Build and explore a minimal implementation of recursive language models with a REPL environment for OpenAI clients. Start hacking today!
🔍 Enhance iterative theorem proving with DSPy by comparing full oracle vs. clipped hints using a mock Lean verifier in this streamlined setup.
Training of Drone Swarms using StableBaselines3, PettingZoo, AirSim and UE4
This repository contains most of pytorch implementation based classic deep reinforcement learning algorithms, including - DQN, DDQN, Dueling Network, DDPG, SAC, A2C, PPO, TRPO. (More algorithms are still in progress)
ray & RLlib tools for unified code across different repositories. Experiments with dynamic hyperparameters
Swimming eel robot with CPG + RL control
Safety challenges for AI agents' ability to learn and act in desired ways in relation to biologically and economically relevant aspects. The benchmarks are implemented in a gridworld-based environment. The environments are relatively simple, just as much complexity is added as is necessary to illustrate the relevant safety and performance aspects.
Partially Observable Multi-Agent RL with Transformers
XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library
Vision-based autonomous racing system comparing PPO, DQN, and GAIL with custom reward shaping across CarRacing-v3 and TORCS simulators
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Deep Reinforcement Learning for mobile robot navigation in IR-SIM simulation. Using DRL (SAC, TD3, PPO, DDPG) neural networks, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
Safe reinforcement learning for quadruped locomotion using Control Barrier Functions (CBF) - Zero falls, 99% safety rate, 90% speed retention with provable safety guarantees
Code for Does Optimism Help PPO? Optimistic Gradient Updates for Multi-Agent Games and Exploration Benchmarks (OPPO Optimistic PPO)
Interactive RL-Based Derivative Hedging platform for options and portfolio simulation, featuring PPO, LSTM, and Transformer RL models, real-time market data, Greeks calculation, portfolio risk analysis, and PDF reporting. Built with Python, Streamlit, and Stable-Baselines3.
Quadruped robot locomotion using Proximal Policy Optimization (PPO) in PyBullet simulation - 30% fall reduction, 25% faster velocity
Clean RL algorithm implementations in under 100 lines each.
This repository implements a Proximal Policy Optimization (PPO) agent that learns to play Super Mario Bros using TensorFlow/Keras and OpenAI Gym. Features CNNs for vision, Actor-Critic architecture, and parallel environments. Train your own Mario master or run a pre-trained one!
Add a description, image, and links to the ppo topic page so that developers can more easily learn about it.
To associate your repository with the ppo topic, visit your repo's landing page and select "manage topics."