Skip to content

Pull requests: huggingface/trl

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

fix: load image-text policy for async grpo
#6032 opened Jun 12, 2026 by he-yufeng Loading…
5 of 8 tasks
fix: pass AsyncGRPO environment rewards
#6031 opened Jun 12, 2026 by he-yufeng Loading…
5 of 8 tasks
Remove silently-ignored W&B/Hub fields from GOLD and Distillation configs
#6023 opened Jun 11, 2026 by DaoyuanLi2816 Contributor Loading…
3 of 4 tasks
Align AsyncGRPO clip-ratio metrics with GRPOTrainer
#6021 opened Jun 11, 2026 by qgallouedec Member Loading…
Align epsilon help/docstring wording
#6014 opened Jun 11, 2026 by qgallouedec Member Loading…
Align async GRPO loss variable names with GRPOTrainer
#6013 opened Jun 11, 2026 by qgallouedec Member Loading…
GRPO adapter-only vLLM LoRA sync
#6007 opened Jun 11, 2026 by rycerzes Contributor Loading…
4 of 8 tasks
Add DiffusionGemma block-diffusion SFT example
#6003 opened Jun 11, 2026 by kashif Collaborator Draft
OPSD: on-policy self-distillation trainer
#5990 opened Jun 9, 2026 by kashif Collaborator Loading…
Fix OnlineDPOTrainer evaluation
#5985 opened Jun 9, 2026 by he-yufeng Loading…
5 of 8 tasks
Fix GRPO KL estimator overflow
#5984 opened Jun 9, 2026 by he-yufeng Loading…
5 of 8 tasks
chore(docs): Add GRPO clipping viz and explanation
#5981 opened Jun 9, 2026 by zafstojano Contributor Loading…
2 of 8 tasks
ProTip! no:milestone will show everything without a milestone.