-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Pull requests: huggingface/trl
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
refactor(sft): build labels during dataset preparation instead of collation
#6037
opened Jun 12, 2026 by
0xadvait
Loading…
5 of 8 tasks
test: don't hard-code bf16=True on devices that lack bf16 support
#6036
opened Jun 12, 2026 by
behroozazarkhalili
Collaborator
Loading…
fix: load image-text policy for async grpo
#6032
opened Jun 12, 2026 by
he-yufeng
Loading…
5 of 8 tasks
fix: pass AsyncGRPO environment rewards
#6031
opened Jun 12, 2026 by
he-yufeng
Loading…
5 of 8 tasks
fix(grpo): correct DAPO/CISPO/VESPO loss normalization when steps_per_generation != gradient_accumulation_steps
#6024
opened Jun 12, 2026 by
0xadvait
Loading…
5 of 8 tasks
Remove silently-ignored W&B/Hub fields from GOLD and Distillation configs
#6023
opened Jun 11, 2026 by
DaoyuanLi2816
Contributor
Loading…
3 of 4 tasks
Align AsyncGRPO clip-ratio metrics with GRPOTrainer
#6021
opened Jun 11, 2026 by
qgallouedec
Member
Loading…
Align AsyncGRPO num_completions_to_print with GRPO (int | None)
#6020
opened Jun 11, 2026 by
qgallouedec
Member
Loading…
Align AsyncGRPO epsilon_high with GRPO (None fallback to epsilon)
#6019
opened Jun 11, 2026 by
qgallouedec
Member
Loading…
Add experimental Harbor integration for GRPO environment training
#6018
opened Jun 11, 2026 by
adithya-s-k
Collaborator
Loading…
Fix logging_steps default mentioned in AsyncGRPOConfig docstring
#6016
opened Jun 11, 2026 by
qgallouedec
Member
Loading…
Align async GRPO loss variable names with GRPOTrainer
#6013
opened Jun 11, 2026 by
qgallouedec
Member
Loading…
Use relative imports in async_grpo to match the rest of trl/experimental`
#6012
opened Jun 11, 2026 by
qgallouedec
Member
Loading…
GRPO adapter-only vLLM LoRA sync
#6007
opened Jun 11, 2026 by
rycerzes
Contributor
Loading…
4 of 8 tasks
Normalize JSD distillation loss by num_items_in_batch for gradient accumulation
#6006
opened Jun 11, 2026 by
behroozazarkhalili
Collaborator
Loading…
Warn when a string model loads in float32 under mixed-precision training
#6005
opened Jun 11, 2026 by
behroozazarkhalili
Collaborator
Loading…
Support multiple environments [2/2]: Per-example environment selection
#6002
opened Jun 11, 2026 by
qgallouedec
Member
Loading…
Support multiple environments [1/2]: Pool and build environment tool dicts at batch time
#6001
opened Jun 11, 2026 by
qgallouedec
Member
Loading…
chore(docs): Add GRPO clipping viz and explanation
#5981
opened Jun 9, 2026 by
zafstojano
Contributor
Loading…
2 of 8 tasks
Previous Next
ProTip!
no:milestone will show everything without a milestone.