Tags: PrimeIntellect-ai/prime-rl
Tags
feat: opt-in renderer-based tokenization for SFT (#2496) Adds `use_renderer` flag to SFTConfig, mirroring the RL path. When enabled, SFTDataset tokenizes via `renderers.base.build_training_sample` instead of the incremental Jinja `apply_chat_template` path, fixing multiturn sample drops and position-dependent template crashes. Co-authored-by: Eric Botti <ericjbotti@gmail.com>
feat: opt-in renderer-based tokenization for SFT (#2493)
chore: tell agents not to edit optimization_dtype / reduce_dtype (#2491) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix disagg: thread kv_transfer_params into engine for /inference/v1/g… …enerate (#2490)
Patch vLLM layerwise reload alias-buffer handling (#2482) * Patch vLLM layerwise alias-buffer reload * Remove layerwise buffer copy-back guard * Skip alias buffers during layerwise copy-back * ruff * Skip absent buffers during layerwise copy-back * Precompute layerwise alias storage pointers * ruff * simpler fix for layerwise alias buffer reload (#2486) * meow * dont think theres a need for this singletoning behavior? * Simplify layerwise alias-buffer patch to copy-back-only Drop the capture-time skip and the absent-buffer guard; detect aliasing at copy-back via parameter storage pointers (recursing into submodules so views like mixer.conv_weights → mixer.conv1d.weight are caught). _place_kernel_tensors then restores the original aliased buffer, which trivially reflects the parameter's freshly-loaded storage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply suggestion from @Jackmin801 Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> --------- Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> Co-authored-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(orchestrator): drop groups after max_error_reschedule_attempts (#… …2459) * feat(orchestrator): drop groups after max_error_reschedule_attempts Adds a per-group retry cap to the rollout scheduler. Each `GroupState` now tracks a `failed_attempts` counter, incremented whenever a batch of rollouts comes back with at least one errored or empty trajectory. When the counter reaches `orchestrator.max_error_reschedule_attempts`, the group is dropped from the current step's batch (cancelling any in-flight rollouts for that group) and the rest of the batch proceeds. Default is `None` (retry indefinitely, current behavior). Set a value to unblock single-example hangs in agent envs (e.g. a sandbox poll that times out at 60s on every retry — that loop was previously infinite because the AgentError surfaces as `rollout["error"]`, which the scheduler treats as a normal "reschedule the group" signal with no give-up condition). Minimal local re-implementation of the abandoned PR #2076 — keeps the retry-cap behavior without the larger GeneratedBatch / variable group-size / deferred-scoring refactor in that PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(orchestrator): report and clear dropped_groups_by_env in metrics Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(orchestrator): off-by-one in max_error_reschedule_attempts check failed_attempts is incremented on the initial failure, so >= caused the group to be dropped without any reschedule. Use > so max_attempts matches the configured number of retries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(orchestrator): include last failure reason in drop log Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(orchestrator): count failed rounds, not failed rollouts, for drop cap For non-group-scoring envs, each individual rollout returns as a separate task, so a single failed dispatch round could increment failed_attempts up to rollouts_per_example times and trip the cap before any retry happened. Tag in-flight requests with a dispatch round and dedupe failures by round. Also switch the drop check to `>=` and reword the config description so the field's semantics match — the group is dropped once this many of its dispatch rounds have returned errored/empty rollouts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PreviousNext