Tags: araffin/sbx
Tags
Update OnPolicyAlgorithmJax & PPO to support custom rollout_buffer_cl… …ass (#90) * Update OnPolicyAlgorithmJax & PPO to support custom rollout_buffer_class * Added assertion to prevent PPO from using DictRolloutBuffer implicitly * Update links to https * Update version --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
KL Adaptive LR for PPO and LR schedule for SAC/TQC (#72) * Only check for terminated episodes * Start adding ortho init * Add SimbaPolicy for PPO * Try adding ortho init to SAC * Enable lr schedule for PPO * Allow to pass lr, prepare for adaptive lr * Implement adaptive lr * Add small test * Refactor adaptive lr * Add adaptive lr for SAC * Fix qf_learning_rate * Revert "Fix qf_learning_rate" This reverts commit ab33983. * Revert "Add adaptive lr for SAC" This reverts commit 5832702. * Revert kl div for SAC changes * Revert dist.mode() in two lines * Cleanup code * Add support for Gaussian actor for SAC * Enable Gaussian actor for TQC * Log std too * Avoid NaN in kl div approx * Allow to use layer_norm in actor * Reformat * Allow max grad norm for TQC and fix optimizer class * Comment out max grad norm * Update to schedule classes * Add lr schedule support for TQC * Revert experimental changes and add support for lr schedule for SAC * Add test for adaptive kl div, remove squash output param
Update PPO to support `net_arch`, and additional fixes (#65) * Add support for flexible arch in PPO * Fix ent_coeff logging for TQC * Fix name order * Fix ent_coeff logging for SAC * Hotfix for PPO, do not squash output at test time * Fix typo * Fix typo in common policy * Try Gaussian dist for TQC * Revert "Try Gaussian dist for TQC" This reverts commit 6eeaf23. * Fix CrossQ ent_coef logging * Log PPO std when possible * Fix for CrossQ
Add SimBa Policy: Simplicity Bias for Scaling Up Parameters in DRL (#59) * Start testing simba * Quick try with CrossQ * Add actor for CrossQ * Add simba net for TQC * Remove unused param * Add parameter resets for TQC * Fix reset * Add missing param * Update documentation * Add parameter resets * Reformat pyproject.toml * Refactor: share actor between SAC and TQC * Add run tests for simba * Upgrade to python 3.9 (#64) * Fix mypy error, update version
Optimize the log of the entropy coeff instead of the entropy coeff (#56) * optimize the log of the entropy coeff instead of the entropy coeff * Update log ent coef for SAC and derivates * Reformat yaml * Use uv for faster downloads * Remove TODO * Remove redundant call --------- Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
PreviousNext