🛠️ Build and explore a minimal implementation of recursive language models with a REPL environment for OpenAI clients. Start hacking today!
-
Updated
Nov 10, 2025 - Python
🛠️ Build and explore a minimal implementation of recursive language models with a REPL environment for OpenAI clients. Start hacking today!
🌍 Explore global health trends through data science with clustering models and statistical validation from the 2021 Global Burden of Disease Study.
🤖 Explore reinforcement learning techniques with projects including a taxi agent using Q-Learning and a DQN-based Space Invaders agent.
🔍 Enhance iterative theorem proving with DSPy by comparing full oracle vs. clipped hints using a mock Lean verifier in this streamlined setup.
Training of Drone Swarms using StableBaselines3, PettingZoo, AirSim and UE4
XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library
A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using the response. A RAG system is build upon the tuned qwen2, using Prompt-Template + Tool-Use + Chroma embedding database + LangChain
Deep Reinforcement Learning by using Proximal Policy Optimization and Random Network Distillation in Tensorflow 2 and Pytorch with some explanation
manipulator experiments , simulation and hardware
Deep Reinforcement Learning by using Truly Proximal Policy Optimization in Tensorflow 2 and Pytorch
Digital Twin-Driven Real-Time Collaborative Scheduling for U-Shaped Automated Container Terminals - Version 2.0
A curated list of papers on reinforcement learning for video generation
A Proximal Policy Optimization (PPO) implementation for the Lunar Lander environment using Gymnasium and PyTorch.
Safety challenges for AI agents' ability to learn and act in desired ways in relation to biologically and economically relevant aspects. The benchmarks are implemented in a gridworld-based environment. The environments are relatively simple, just as much complexity is added as is necessary to illustrate the relevant safety and performance aspects.
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
This repository contains implementations for training Super Mario Bros agents using reinforcement learning, featuring standardized preprocessing pipelines and serving as a reproducible RL benchmark environment.
[Transportation Research Part C / AAAI-DC 2026] Official Repository for The Paper, The Impacts of Data Privacy Regulations on Food-Delivery Platforms
Using deep reinforcement learning to play Snake game. The used algorithm is PPO for discrete! It has the brilliant performance in the field of discrete action space just like in continuous action space. You just need half an hour to train the snake and then it can be as smart as you.|使用深度强化学习玩蛇游戏。 使用的算法是离散的 PPO! 它在离散动作空间领域有着与连续动作空间一样的出色表现。
Add a description, image, and links to the ppo topic page so that developers can more easily learn about it.
To associate your repository with the ppo topic, visit your repo's landing page and select "manage topics."