-
Institute of Automation,Chinese Academy of Sciences
- Beijing
Stars
Open-source implementation of AlphaEvolve
AutoThink is a reinforcement learning framework designed to equip R1-style language models with adaptive reasoning capabilities. Instead of always thinking or never thinking, the model learns when …
Official Repository of "Learning to Reason under Off-Policy Guidance"
Verification of Google DeepMind's AlphaEvolve 48-multiplication matrix algorithm, a breakthrough in matrix multiplication after 56 years.
An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks
[ICML 2025] Official Code of SMPE: "Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration"
Drone Automation using Multi-Agent Reinforcement Learning
Training of Drone Swarms using StableBaselines3, PettingZoo, AirSim and UE4
Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习
PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF) and Extensions: N-step Bootstrapping, PER, Noisy Layer, Du…
Implementation of 'A Distributional Perspective on Reinforcement Learning' and 'Distributional Reinforcement Learning with Quantile Regression' based on OpenAi DQN baselines.
PRML Page-by-page配套资料,对PRML全书及各章节的review
A list of papers regarding generalization in (deep) reinforcement learning
Lightweight multi-agent gridworld Gym environment
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
Minimal code for A Generalist Agent
A cute running cat animation on your windows taskbar.
DeepRL algorithms implementation easy for understanding and reading with Pytorch and Tensorflow 2(DQN, REINFORCE, VPG, A2C, TRPO, PPO, DDPG, TD3, SAC)
mcmachado / protovalue
Forked from roshanshariff/protovalueA visualization of proto-value functions
Code for the paper "Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction"