Tags · PrimeIntellect-ai/prime-rl

v0.5.1.dev178

log masked advantage sign rates (#2481)

May 15, 2026
c654dc1
zip
tar.gz

v0.5.1.dev177

feat: opt-in renderer-based tokenization for SFT (#2496)

Adds `use_renderer` flag to SFTConfig, mirroring the RL path. When enabled,
SFTDataset tokenizes via `renderers.base.build_training_sample` instead of
the incremental Jinja `apply_chat_template` path, fixing multiturn sample
drops and position-dependent template crashes.

Co-authored-by: Eric Botti <ericjbotti@gmail.com>

May 14, 2026
6ba37c2
zip
tar.gz

v0.5.1.dev176

Revert "feat: opt-in renderer-based tokenization for SFT (#2493)" (#2495

)

This reverts commit b5f196a.

May 14, 2026
458b16e
zip
tar.gz

v0.5.1.dev175

feat: opt-in renderer-based tokenization for SFT (#2493)

May 14, 2026
b5f196a
zip
tar.gz

v0.5.1.dev174

chore: tell agents not to edit optimization_dtype / reduce_dtype (#2491)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

May 14, 2026
6dda4a1
zip
tar.gz

v0.5.1.dev173

Fix disagg: thread kv_transfer_params into engine for /inference/v1/g…

…enerate (#2490)

May 14, 2026
fa55779
zip
tar.gz

v0.5.1.dev172

Patch vLLM layerwise reload alias-buffer handling (#2482)

* Patch vLLM layerwise alias-buffer reload

* Remove layerwise buffer copy-back guard

* Skip alias buffers during layerwise copy-back

* ruff

* Skip absent buffers during layerwise copy-back

* Precompute layerwise alias storage pointers

* ruff

* simpler fix for layerwise alias buffer reload (#2486)

* meow

* dont think theres a need for this singletoning behavior?

* Simplify layerwise alias-buffer patch to copy-back-only

Drop the capture-time skip and the absent-buffer guard; detect aliasing
at copy-back via parameter storage pointers (recursing into submodules
so views like mixer.conv_weights → mixer.conv1d.weight are caught).
_place_kernel_tensors then restores the original aliased buffer, which
trivially reflects the parameter's freshly-loaded storage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Apply suggestion from @Jackmin801

Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>

---------

Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Co-authored-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

May 13, 2026
b2ba40b
zip
tar.gz

v0.5.1.dev171

add poolside laguna custom model (#2479)

May 13, 2026
e4330c2
zip
tar.gz

v0.5.1.dev170

feat(orchestrator): drop groups after max_error_reschedule_attempts (#…

…2459)

* feat(orchestrator): drop groups after max_error_reschedule_attempts

Adds a per-group retry cap to the rollout scheduler. Each `GroupState`
now tracks a `failed_attempts` counter, incremented whenever a batch of
rollouts comes back with at least one errored or empty trajectory. When
the counter reaches `orchestrator.max_error_reschedule_attempts`, the
group is dropped from the current step's batch (cancelling any
in-flight rollouts for that group) and the rest of the batch proceeds.

Default is `None` (retry indefinitely, current behavior). Set a value
to unblock single-example hangs in agent envs (e.g. a sandbox poll
that times out at 60s on every retry — that loop was previously
infinite because the AgentError surfaces as `rollout["error"]`, which
the scheduler treats as a normal "reschedule the group" signal with no
give-up condition).

Minimal local re-implementation of the abandoned PR #2076 — keeps the
retry-cap behavior without the larger GeneratedBatch / variable
group-size / deferred-scoring refactor in that PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(orchestrator): report and clear dropped_groups_by_env in metrics

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(orchestrator): off-by-one in max_error_reschedule_attempts check

failed_attempts is incremented on the initial failure, so >= caused the
group to be dropped without any reschedule. Use > so max_attempts
matches the configured number of retries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(orchestrator): include last failure reason in drop log

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(orchestrator): count failed rounds, not failed rollouts, for drop cap

For non-group-scoring envs, each individual rollout returns as a
separate task, so a single failed dispatch round could increment
failed_attempts up to rollouts_per_example times and trip the cap before
any retry happened. Tag in-flight requests with a dispatch round and
dedupe failures by round.

Also switch the drop check to `>=` and reword the config description so
the field's semantics match — the group is dropped once this many of
its dispatch rounds have returned errored/empty rollouts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

May 13, 2026
4a6b454
zip
tar.gz

v0.5.1.dev169

bump prime cli to 0.6.4 (#2480)

May 12, 2026
024abb2
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.1.dev178

v0.5.1.dev177

v0.5.1.dev176

v0.5.1.dev175

v0.5.1.dev174

v0.5.1.dev173

v0.5.1.dev172

v0.5.1.dev171

v0.5.1.dev170

v0.5.1.dev169

Tags: PrimeIntellect-ai/prime-rl