OpenRLHF / OpenRLHF Public

Notifications You must be signed in to change notification settings
Fork 968
Star 9.6k

Code
Issues 297
Pull requests 40
Discussions
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security and quality
Insights

Pull requests: OpenRLHF/OpenRLHF

Labels 13 Milestones 0

New pull request New

40 Open 448 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Added REBEL-inspired offline reward-gap regression loss to DPO trainer

#1247 opened Jun 5, 2026 by LeoPhilly

Loading…

Fix overlong penalty action token length

#1246 opened May 30, 2026 by Jiang020609

Loading…

Security: Arbitrary Code Execution via trust_remote_code=True in Dataset Loading

#1241 opened May 20, 2026 by tuanaiseo

Loading…

3 tasks

fix: ignore masked invalid values in policy loss reductions

#1240 opened May 17, 2026 by haoyang9804

Loading…

fix: ignore zero-action samples in group reward baselines

#1239 opened May 17, 2026 by haoyang9804

Loading…

Add TokenSpeed-backed PPO rollout engine

#1237 opened May 7, 2026 by 4teven

Loading…

Fix vLLM GPU allocation for colocated tensor-parallel engines

#1231 opened Apr 30, 2026 by fuyuan-li

Loading…

Add Trackio logger backend

#1230 opened Apr 27, 2026 by abidlabs

Loading…

Clarify reward normalization behavior for custom reward sources

#1229 opened Apr 26, 2026 by taivu1998

Loading…

Add SFT tools field support for chat templates

#1228 opened Apr 26, 2026 by taivu1998

Loading…

Fix multi-turn SFT masks for Qwen3 thinking templates

#1227 opened Apr 26, 2026 by taivu1998

Loading…

Replace Deepspeed backend with Automodel

#1226 opened Apr 26, 2026 by hijkzzz Collaborator

Loading…

feat: full async PPO training with partial rollout agent support

#1218 opened Apr 11, 2026 by LYMDLUT Collaborator

Loading…

fix: true loss aggregation across dp ranks

#1216 opened Apr 10, 2026 by alek6kun

Loading…

Fast Evolutionary Algorithm Support

#1214 opened Apr 5, 2026 by DavidKoplow

Loading…

feat: add --from_scratch option to initialize model with random weights

#1209 opened Apr 1, 2026 by konghw-git Contributor

Loading…

2 tasks done

adding CFPO to OpenRLHF

#1184 opened Feb 9, 2026 by asparius

Loading…

feat: add AgentTokenHandler with defensive token concatenation for agentic training (#1128)

#1181 opened Jan 30, 2026 by ichbinlucaskim

Loading…

feat: Switch vLLM rollout sampling to oversampling.

#1179 opened Jan 20, 2026 by Freder-chen Contributor

Loading…

feat: add the support of fsdp2 and remove deepspeed (new version of PR 1115)

#1176 opened Jan 16, 2026 by LYMDLUT Collaborator

Loading…

Default overlap_comm on for ZeRO-2+ RLHF runs

#1154 opened Nov 26, 2025 by MagellaX

Loading…

Enhance PPO logging with entropy, reward stats, and grad norm insights

#1148 opened Nov 8, 2025 by MagellaX

Loading…

Star Attention topology support with model integration and --attn_topology flag

#1122 opened Aug 26, 2025 by MagellaX

Loading…

Add FSDP backend and --dist_backend flag across CLIs; introduce FSDPStrategy

#1115 opened Aug 23, 2025 by MagellaX

Loading…

CLI support for top_k

#1104 opened Aug 13, 2025 by JoNeedsSleep

Loading…

Previous 1 2 Next

Previous Next

ProTip! Updated in the last three days: updated:>2026-06-08.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!