Tags · araffin/sbx

v0.26.0

Update OnPolicyAlgorithmJax & PPO to support custom rollout_buffer_cl…

…ass (#90)

* Update OnPolicyAlgorithmJax & PPO to support custom rollout_buffer_class

* Added assertion to prevent PPO from using DictRolloutBuffer implicitly

* Update links to https

* Update version

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

May 11, 2026
cd77f42
zip
tar.gz
Notes

v0.25.0

Bump version

Dec 21, 2025
5e0748f
zip
tar.gz
Notes

v0.24.0

Add Python 3.13 support, drop Python 3.9 (#83)

* Drop Python 3.9 support

* Apply autofixes for Python 3.10

* Reformat

Dec 5, 2025
1d2da71
zip
tar.gz
Notes

v0.23.0

Release v0.23.0

Sep 29, 2025
3e9e66d
zip
tar.gz
Notes

v0.22.0

Add n-step return support with `n_steps` parameter (#74)

* Add support for n-step returns

* Add type hint

* Cleanup ppo code

* Log policy and entropy loss separately

* Cleanup vf init

* Update version

* Add test for n steps

* Cap Jax version

* Reformat

Jul 25, 2025
1e5e433
zip
tar.gz
Notes

v0.21.0

KL Adaptive LR for PPO and LR schedule for SAC/TQC (#72)

* Only check for terminated episodes

* Start adding ortho init

* Add SimbaPolicy for PPO

* Try adding ortho init to SAC

* Enable lr schedule for PPO

* Allow to pass lr, prepare for adaptive lr

* Implement adaptive lr

* Add small test

* Refactor adaptive lr

* Add adaptive lr for SAC

* Fix qf_learning_rate

* Revert "Fix qf_learning_rate"

This reverts commit ab33983.

* Revert "Add adaptive lr for SAC"

This reverts commit 5832702.

* Revert kl div for SAC changes

* Revert dist.mode() in two lines

* Cleanup code

* Add support for Gaussian actor for SAC

* Enable Gaussian actor for TQC

* Log std too

* Avoid NaN in kl div approx

* Allow to use layer_norm in actor

* Reformat

* Allow max grad norm for TQC and fix optimizer class

* Comment out max grad norm

* Update to schedule classes

* Add lr schedule support for TQC

* Revert experimental changes and add support for lr schedule for SAC

* Add test for adaptive kl div, remove squash output param

May 19, 2025
849e908
zip
tar.gz
Notes

v0.20.0

Update PPO to support `net_arch`, and additional fixes (#65)

* Add support for flexible arch in PPO

* Fix ent_coeff logging for TQC

* Fix name order

* Fix ent_coeff logging for SAC

* Hotfix for PPO, do not squash output at test time

* Fix typo

* Fix typo in common policy

* Try Gaussian dist for TQC

* Revert "Try Gaussian dist for TQC"

This reverts commit 6eeaf23.

* Fix CrossQ ent_coef logging

* Log PPO std when possible

* Fix for CrossQ

Feb 14, 2025
8238fcc
zip
tar.gz
Notes

v0.19.0

Add SimBa Policy: Simplicity Bias for Scaling Up Parameters in DRL (#59)

* Start testing simba

* Quick try with CrossQ

* Add actor for CrossQ

* Add simba net for TQC

* Remove unused param

* Add parameter resets for TQC

* Fix reset

* Add missing param

* Update documentation

* Add parameter resets

* Reformat pyproject.toml

* Refactor: share actor between SAC and TQC

* Add run tests for simba

* Upgrade to python 3.9 (#64)

* Fix mypy error, update version

Jan 14, 2025
9cad1d0
zip
tar.gz
Notes

v0.18.0

Optimize the log of the entropy coeff instead of the entropy coeff (#56)

* optimize the log of the entropy coeff instead of the entropy coeff

* Update log ent coef for SAC and derivates

* Reformat yaml

* Use uv for faster downloads

* Remove TODO

* Remove redundant call

---------

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

Nov 1, 2024
1c79684
zip
tar.gz
Notes

v0.17.0

Add CNN support for DQN (#49)

* Add CNN support for DQN

* Update version and deps

* Fix CNN, channel last, padding and reshape

Jul 11, 2024
19c85a1
zip
tar.gz
Notes

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.26.0

v0.25.0

v0.24.0

v0.23.0

v0.22.0

v0.21.0

v0.20.0

v0.19.0

v0.18.0

v0.17.0

Tags: araffin/sbx