Build software better, together

boomzdig22-coder / rlm

🛠️ Build and explore a minimal implementation of recursive language models with a REPL environment for OpenAI clients. Start hacking today!

python reinforcement-learning wifi radius aaa collaborative-filtering isp hotspot freeradius wifi-hotspot sac mujoco hotspot-wifi ppo a2c graph-neural-networks molecule-generation large-language-models

Updated Nov 10, 2025
Python

sasagucloth / Master-s-Thesis-in-Data-Science-

Star

🌍 Explore global health trends through data science with clustering models and statistical validation from the 2021 Global Burden of Disease Study.

data-science data reinforcement-learning deep-learning random-forest clustering vegetation web-scraping geography predictive-modeling clip hyperspectral kmeans-clustering ppo spatial-autocorrelation global-burden-of-disease openclip surfing-dogs

Updated Nov 10, 2025

Karakarawowow / machin

Star

python data-science algorithm scikit-learn regression logistic pytorch dqn smo knn datamining sac azure-machine-learning ppo reinforcementlearning td3 adaboost-algorithm a3c-pytorch

Updated Nov 10, 2025
Jupyter Notebook

Skw3mdy / Reinforcement-Learning-Projects

Star

🤖 Explore reinforcement learning techniques with projects including a taxi agent using Q-Learning and a DQN-based Space Invaders agent.

machine-learning robotics unity simulation deep-reinforcement-learning dcgan gym neural-networks cartpole sac augmentation ppo erfnet td3 semantic-segmentation-models pytorch-template intrinsic-reward huggingface

Updated Nov 10, 2025
Jupyter Notebook

Vvalejandro / dspy-lean-prover-hint-clipping

Star

🔍 Enhance iterative theorem proving with DSPy by comparing full oracle vs. clipped hints using a mock Lean verifier in this streamlined setup.

experiment evaluation program-synthesis dataset rl lean clipping variance-reduction ppo tool-use policy-improvement offline-rl dspy leandojo

Updated Nov 10, 2025
Python

bensugursoy / Drone-Swarm-RL-airsim-sb3

Star

Training of Drone Swarms using StableBaselines3, PettingZoo, AirSim and UE4

reinforcement-learning drone unreal-engine drones swarm-intelligence airsim multiagent-reinforcement-learning supersuit ppo swarm-robotics marl droneswarm pettingzoo stablebaselines3

Updated Nov 10, 2025
Python

agi-brain / xuance

Star

XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library

reinforcement-learning pytorch dqn atari ddpg mpe mujoco ppo magent starcraft2 a2c multi-agent-reinforcement-learning maddpg tensorflow2 google-research-football mindspore qmix mappo reinforcement-learning-library

Updated Nov 10, 2025
Python

NJUxlj / Travel-Agent-based-on-Qwen2-RLHF

Star

A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using the response. A RAG system is build upon the tuned qwen2, using Prompt-Template + Tool-Use + Chroma embedding database + LangChain

agent lora ppo rag dpo tool-use langchain rlhf qwen2 grpo

Updated Nov 10, 2025
Python

wisnunugroho21 / reinforcement_learning_ppo_rnd

Star

Deep Reinforcement Learning by using Proximal Policy Optimization and Random Network Distillation in Tensorflow 2 and Pytorch with some explanation

reinforcement-learning deep-reinforcement-learning pytorch gym frozenlake-v0 proximal-policy-optimization ppo cartpole-v0 lunar-lander random-network-distillation bipedalwalker ppo-rnd frozenlake-not-slippery

Updated Nov 10, 2025
Python

Labeeb1234 / Manipulator-Experiments

Star

manipulator experiments , simulation and hardware

reinforcement-learning simulation ros2 ppo franka-emika mediapipe dobot-magician nvidia-isaaclab pydobot

Updated Nov 9, 2025
Python

wisnunugroho21 / reinforcement_learning_truly_ppo

Star

Deep Reinforcement Learning by using Truly Proximal Policy Optimization in Tensorflow 2 and Pytorch

reinforcement-learning deep-learning deep-reinforcement-learning pytorch ppo on-policy

Updated Nov 9, 2025
Python

salman-shah-ai / Digital-Twin-Driven-Real-Time-Collaborative-Scheduling-for-U-Shaped-Automated-Container-Terminals-V2

Star

Digital Twin-Driven Real-Time Collaborative Scheduling for U-Shaped Automated Container Terminals - Version 2.0

deep-reinforcement-learning ppo digital-twin ai-research smart-ports collaborative-scheduling logistics-simulation automated-container-terminal

Updated Nov 7, 2025
Python

VocabVictor / verl-plus

Star

增加verl ascend适配；做一些小的改进

ppo dpo grpo dapo

Updated Nov 8, 2025
Python

wendell0218 / Awesome-RL-for-Video-Generation

Star

A curated list of papers on reinforcement learning for video generation

reinforcement-learning ppo video-generation dpo reward-model grpo

Updated Nov 7, 2025

giansimone / ppo-gymnasium-lunarlander

Star

A Proximal Policy Optimization (PPO) implementation for the Lunar Lander environment using Gymnasium and PyTorch.

python reinforcement-learning deep-reinforcement-learning torch pytorch gymnasium proximal-policy-optimization ppo lunar-lander ppo-pytorch

Updated Nov 7, 2025
Python

biological-alignment-benchmarks / biological-alignment-gridworlds-benchmarks

Star

Safety challenges for AI agents' ability to learn and act in desired ways in relation to biologically and economically relevant aspects. The benchmarks are implemented in a gridworld-based environment. The environments are relatively simple, just as much complexity is added as is necessary to illustrate the relevant safety and performance aspects.

Updated Nov 7, 2025
Python

kengz / SLM-Lab

Star

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

benchmark reinforcement-learning deep-reinforcement-learning pytorch dqn policy-gradient a3c sac ppo a2c

Updated Nov 9, 2025
Python

tooichitake / gymnasium-mario

Star

This repository contains implementations for training Super Mario Bros agents using reinforcement learning, featuring standardized preprocessing pipelines and serving as a reproducible RL benchmark environment.

benchmark reinforcement-learning impala cnn game-ai gymnasium actor-critic super-mario-bros wrappers ppo pytorch-implementation stable-baselines3

Updated Nov 5, 2025
Python

RS2002 / GDPR-Food-Delivery

Star

[Transportation Research Part C / AAAI-DC 2026] Official Repository for The Paper, The Impacts of Data Privacy Regulations on Food-Delivery Platforms

reinforcement-learning thompson-sampling heterogeneous-network ddqn long-tail modular-architecture ppo ethical-artificial-intelligence food-delivery multi-agent-reinforcement-learning multi-bandit-army multi-lora multi-action-reinforcement-learning

Updated Nov 4, 2025
Python

Genius-Society / SnakeAI

Star

Using deep reinforcement learning to play Snake game. The used algorithm is PPO for discrete! It has the brilliant performance in the field of discrete action space just like in continuous action space. You just need half an hour to train the snake and then it can be as smart as you.|使用深度强化学习玩蛇游戏。使用的算法是离散的 PPO！它在离散动作空间领域有着与连续动作空间一样的出色表现。

drl hamiltonian-cycle ppo a-star-path-finding

Updated Nov 3, 2025
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ppo

Here are 902 public repositories matching this topic...

boomzdig22-coder / rlm

sasagucloth / Master-s-Thesis-in-Data-Science-

Karakarawowow / machin

Skw3mdy / Reinforcement-Learning-Projects

Vvalejandro / dspy-lean-prover-hint-clipping

bensugursoy / Drone-Swarm-RL-airsim-sb3

agi-brain / xuance

NJUxlj / Travel-Agent-based-on-Qwen2-RLHF

wisnunugroho21 / reinforcement_learning_ppo_rnd

Labeeb1234 / Manipulator-Experiments

wisnunugroho21 / reinforcement_learning_truly_ppo

salman-shah-ai / Digital-Twin-Driven-Real-Time-Collaborative-Scheduling-for-U-Shaped-Automated-Container-Terminals-V2

VocabVictor / verl-plus

wendell0218 / Awesome-RL-for-Video-Generation

giansimone / ppo-gymnasium-lunarlander

biological-alignment-benchmarks / biological-alignment-gridworlds-benchmarks

kengz / SLM-Lab

tooichitake / gymnasium-mario

RS2002 / GDPR-Food-Delivery

Genius-Society / SnakeAI

Improve this page

Add this topic to your repo