Build software better, together

zengatso / orpo

🚀 Optimize preferences effectively with ORPO, a framework for monolithic preference optimization without a reference model.

data reinforcement-learning medical human-pose-estimation gpt lora privacy-preserving ppo dpo huggingface kto low-resolution-images model-averaging llm generative-ai rlhf qwen medicalgpt

Updated Dec 17, 2025
Python

boomzdig22-coder / rlm

Star

🛠️ Build and explore a minimal implementation of recursive language models with a REPL environment for OpenAI clients. Start hacking today!

python reinforcement-learning wifi radius aaa collaborative-filtering isp hotspot freeradius wifi-hotspot sac mujoco hotspot-wifi ppo a2c graph-neural-networks molecule-generation large-language-models

Updated Dec 17, 2025
Python

sasagucloth / Master-s-Thesis-in-Data-Science-

Star

🌍 Explore global health trends through data science with clustering models and statistical validation from the 2021 Global Burden of Disease Study.

data-science data reinforcement-learning deep-learning random-forest clustering vegetation web-scraping geography predictive-modeling clip hyperspectral kmeans-clustering ppo spatial-autocorrelation global-burden-of-disease openclip surfing-dogs

Updated Dec 17, 2025

Karakarawowow / machin

Star

python data-science algorithm scikit-learn regression logistic pytorch dqn smo knn datamining sac azure-machine-learning ppo reinforcementlearning td3 adaboost-algorithm a3c-pytorch

Updated Dec 17, 2025
Jupyter Notebook

Skw3mdy / Reinforcement-Learning-Projects

Star

🤖 Explore reinforcement learning techniques with projects including a taxi agent using Q-Learning and a DQN-based Space Invaders agent.

machine-learning robotics unity simulation deep-reinforcement-learning dcgan gym neural-networks cartpole sac augmentation ppo erfnet td3 semantic-segmentation-models pytorch-template intrinsic-reward huggingface

Updated Dec 17, 2025
Jupyter Notebook

Vvalejandro / dspy-lean-prover-hint-clipping

Star

🔍 Enhance iterative theorem proving with DSPy by comparing full oracle vs. clipped hints using a mock Lean verifier in this streamlined setup.

experiment evaluation program-synthesis dataset rl lean clipping variance-reduction ppo tool-use policy-improvement offline-rl dspy leandojo

Updated Dec 17, 2025
Python

bensugursoy / Drone-Swarm-RL-airsim-sb3

Star

Training of Drone Swarms using StableBaselines3, PettingZoo, AirSim and UE4

reinforcement-learning drone unreal-engine drones swarm-intelligence airsim multiagent-reinforcement-learning supersuit ppo swarm-robotics marl droneswarm pettingzoo stablebaselines3

Updated Dec 17, 2025
Python

biological-alignment-benchmarks / biological-alignment-gridworlds-benchmarks

Star

Safety challenges for AI agents' ability to learn and act in desired ways in relation to biologically and economically relevant aspects. The benchmarks are implemented in a gridworld-based environment. The environments are relatively simple, just as much complexity is added as is necessary to illustrate the relevant safety and performance aspects.

Updated Dec 17, 2025
Python

gabe00122 / jaxrl

Star

Partially Observable Multi-Agent RL with Transformers

reinforcement-learning deep-learning transformers flax ppo jax

Updated Dec 16, 2025
Python

Maguids / Autonomous-Vehicle-Lane-Following-with-Obstacle-Avoidance-RL

Star

The project uses Webots and Reinforcement Learning to train the Toyota Prius vehicle to follow the road line and avoid obstacles. Second Semester of the Third Year of the Bachelor's Degree in Artificial Intelligence and Data Science.

reinforcement-learning avoid-obstacles webots ppo td3 follow-line toyota-prius

Updated Dec 16, 2025

agi-brain / xuance

Star

XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library

reinforcement-learning pytorch dqn atari ddpg mpe mujoco ppo magent starcraft2 a2c multi-agent-reinforcement-learning maddpg tensorflow2 google-research-football mindspore qmix mappo reinforcement-learning-library

Updated Dec 16, 2025
Python

yoshitan240801 / reinforcement_learning

Star

Pytorchを用いた強化学習のユースケース集(A2C、PPO、SAC、MPC)

pytorch mpc sac ppo a2c

Updated Dec 16, 2025
Jupyter Notebook

KristianHolme / DRiL.jl

Star

An attempt at implementing a Deep Reinforcement Learning package

machine-learning deep-reinforcement-learning ppo

Updated Dec 15, 2025
Julia

Columbia-F1-Robotics / f1_robotics_racing_sim

Star

Vision-based autonomous racing system comparing PPO, DQN, and GAIL with custom reward shaping across CarRacing-v3 and TORCS simulators

reinforcement-learning computer-vision robotics dqn torcs deep-q-network f1 imitation-learning proximal-policy-optimization ppo torcs-env gail columbia-university reward-shaping autonomous-racing generative-adversarial-imitation-learning gymanasium

Updated Dec 15, 2025
Python

Jason-Hoford / inversus-reinforcement-learning

Star

INVERSUS-inspired game environment + PPO training pipeline for learning competitive tile-shooter strategies (dummy → self-play).

python reinforcement-learning ai deep-learning cnn pygame pytorch gym rl ppo self-play inversus

Updated Dec 15, 2025
Python

kengz / SLM-Lab

Star

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

benchmark reinforcement-learning deep-reinforcement-learning pytorch dqn policy-gradient a3c sac ppo a2c

Updated Dec 14, 2025
Python

reiniscimurs / DRL-robot-navigation-IR-SIM

Star

Deep Reinforcement Learning for mobile robot navigation in IR-SIM simulation. Using DRL (SAC, TD3, PPO, DDPG) neural networks, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.

ddpg obstacle-avoidance sac drl ppo robot-navigation obstacle-avoidance-robot td3 ddpg-pytorch ppo-pytorch sac-pytorch drl-pytorch td3-pytorch ir-sim

Updated Dec 14, 2025
Python

nedamhs / DiabetesRL

Star

RL for Type-1 Diabetes Control

diabetes ppo diabetes-management simglucose

Updated Dec 14, 2025
HTML

ansh1113 / rl-locomotion-cbf

Star

Safe reinforcement learning for quadruped locomotion using Control Barrier Functions (CBF) - Zero falls, 99% safety rate, 90% speed retention with provable safety guarantees

python machine-learning reinforcement-learning robotics optimization safety locomotion quadruped cbf ppo control-barrier-functions safe-learning

Updated Dec 13, 2025
Python

ryanhlewis / oppo

Sponsor

Star

Code for Does Optimism Help PPO? Optimistic Gradient Updates for Multi-Agent Games and Exploration Benchmarks (OPPO Optimistic PPO)

rl ppo optimistic

Updated Dec 13, 2025
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ppo

Here are 951 public repositories matching this topic...

zengatso / orpo

boomzdig22-coder / rlm

sasagucloth / Master-s-Thesis-in-Data-Science-

Karakarawowow / machin

Skw3mdy / Reinforcement-Learning-Projects

Vvalejandro / dspy-lean-prover-hint-clipping

bensugursoy / Drone-Swarm-RL-airsim-sb3

biological-alignment-benchmarks / biological-alignment-gridworlds-benchmarks

gabe00122 / jaxrl

Maguids / Autonomous-Vehicle-Lane-Following-with-Obstacle-Avoidance-RL

agi-brain / xuance

yoshitan240801 / reinforcement_learning

KristianHolme / DRiL.jl

Columbia-F1-Robotics / f1_robotics_racing_sim

Jason-Hoford / inversus-reinforcement-learning

kengz / SLM-Lab

reiniscimurs / DRL-robot-navigation-IR-SIM

nedamhs / DiabetesRL

ansh1113 / rl-locomotion-cbf

ryanhlewis / oppo

Improve this page

Add this topic to your repo