-
Notifications
You must be signed in to change notification settings - Fork 968
Pull requests: OpenRLHF/OpenRLHF
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Added REBEL-inspired offline reward-gap regression loss to DPO trainer
#1247
opened Jun 5, 2026 by
LeoPhilly
Loading…
Security: Arbitrary Code Execution via trust_remote_code=True in Dataset Loading
#1241
opened May 20, 2026 by
tuanaiseo
Loading…
3 tasks
fix: ignore masked invalid values in policy loss reductions
#1240
opened May 17, 2026 by
haoyang9804
Loading…
fix: ignore zero-action samples in group reward baselines
#1239
opened May 17, 2026 by
haoyang9804
Loading…
Fix vLLM GPU allocation for colocated tensor-parallel engines
#1231
opened Apr 30, 2026 by
fuyuan-li
Loading…
Clarify reward normalization behavior for custom reward sources
#1229
opened Apr 26, 2026 by
taivu1998
Loading…
Fix multi-turn SFT masks for Qwen3 thinking templates
#1227
opened Apr 26, 2026 by
taivu1998
Loading…
feat: full async PPO training with partial rollout agent support
#1218
opened Apr 11, 2026 by
LYMDLUT
Collaborator
Loading…
feat: add --from_scratch option to initialize model with random weights
#1209
opened Apr 1, 2026 by
konghw-git
Contributor
Loading…
2 tasks done
feat: add AgentTokenHandler with defensive token concatenation for agentic training (#1128)
#1181
opened Jan 30, 2026 by
ichbinlucaskim
Loading…
feat: Switch vLLM rollout sampling to oversampling.
#1179
opened Jan 20, 2026 by
Freder-chen
Contributor
Loading…
feat: add the support of fsdp2 and remove deepspeed (new version of PR 1115)
#1176
opened Jan 16, 2026 by
LYMDLUT
Collaborator
Loading…
Enhance PPO logging with entropy, reward stats, and grad norm insights
#1148
opened Nov 8, 2025 by
MagellaX
Loading…
Star Attention topology support with model integration and --attn_topology flag
#1122
opened Aug 26, 2025 by
MagellaX
Loading…
Add FSDP backend and --dist_backend flag across CLIs; introduce FSDPStrategy
#1115
opened Aug 23, 2025 by
MagellaX
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-06-08.