Skip to content

Tags: PrimeIntellect-ai/prime-rl

Tags

v0.5.1.dev178

Toggle v0.5.1.dev178's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
log masked advantage sign rates (#2481)

v0.5.1.dev177

Toggle v0.5.1.dev177's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: opt-in renderer-based tokenization for SFT (#2496)

Adds `use_renderer` flag to SFTConfig, mirroring the RL path. When enabled,
SFTDataset tokenizes via `renderers.base.build_training_sample` instead of
the incremental Jinja `apply_chat_template` path, fixing multiturn sample
drops and position-dependent template crashes.

Co-authored-by: Eric Botti <ericjbotti@gmail.com>

v0.5.1.dev176

Toggle v0.5.1.dev176's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Revert "feat: opt-in renderer-based tokenization for SFT (#2493)" (#2495

)

This reverts commit b5f196a.

v0.5.1.dev175

Toggle v0.5.1.dev175's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: opt-in renderer-based tokenization for SFT (#2493)

v0.5.1.dev174

Toggle v0.5.1.dev174's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: tell agents not to edit optimization_dtype / reduce_dtype (#2491)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

v0.5.1.dev173

Toggle v0.5.1.dev173's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fix disagg: thread kv_transfer_params into engine for /inference/v1/g…

…enerate (#2490)

v0.5.1.dev172

Toggle v0.5.1.dev172's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Patch vLLM layerwise reload alias-buffer handling (#2482)

* Patch vLLM layerwise alias-buffer reload

* Remove layerwise buffer copy-back guard

* Skip alias buffers during layerwise copy-back

* ruff

* Skip absent buffers during layerwise copy-back

* Precompute layerwise alias storage pointers

* ruff

* simpler fix for layerwise alias buffer reload (#2486)

* meow

* dont think theres a need for this singletoning behavior?

* Simplify layerwise alias-buffer patch to copy-back-only

Drop the capture-time skip and the absent-buffer guard; detect aliasing
at copy-back via parameter storage pointers (recursing into submodules
so views like mixer.conv_weights → mixer.conv1d.weight are caught).
_place_kernel_tensors then restores the original aliased buffer, which
trivially reflects the parameter's freshly-loaded storage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Apply suggestion from @Jackmin801

Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>

---------

Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Co-authored-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

v0.5.1.dev171

Toggle v0.5.1.dev171's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
add poolside laguna custom model (#2479)

v0.5.1.dev170

Toggle v0.5.1.dev170's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat(orchestrator): drop groups after max_error_reschedule_attempts (#…

…2459)

* feat(orchestrator): drop groups after max_error_reschedule_attempts

Adds a per-group retry cap to the rollout scheduler. Each `GroupState`
now tracks a `failed_attempts` counter, incremented whenever a batch of
rollouts comes back with at least one errored or empty trajectory. When
the counter reaches `orchestrator.max_error_reschedule_attempts`, the
group is dropped from the current step's batch (cancelling any
in-flight rollouts for that group) and the rest of the batch proceeds.

Default is `None` (retry indefinitely, current behavior). Set a value
to unblock single-example hangs in agent envs (e.g. a sandbox poll
that times out at 60s on every retry — that loop was previously
infinite because the AgentError surfaces as `rollout["error"]`, which
the scheduler treats as a normal "reschedule the group" signal with no
give-up condition).

Minimal local re-implementation of the abandoned PR #2076 — keeps the
retry-cap behavior without the larger GeneratedBatch / variable
group-size / deferred-scoring refactor in that PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(orchestrator): report and clear dropped_groups_by_env in metrics

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(orchestrator): off-by-one in max_error_reschedule_attempts check

failed_attempts is incremented on the initial failure, so >= caused the
group to be dropped without any reschedule. Use > so max_attempts
matches the configured number of retries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(orchestrator): include last failure reason in drop log

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(orchestrator): count failed rounds, not failed rollouts, for drop cap

For non-group-scoring envs, each individual rollout returns as a
separate task, so a single failed dispatch round could increment
failed_attempts up to rollouts_per_example times and trip the cap before
any retry happened. Tag in-flight requests with a dispatch round and
dedupe failures by round.

Also switch the drop check to `>=` and reword the config description so
the field's semantics match — the group is dropped once this many of
its dispatch rounds have returned errored/empty rollouts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

v0.5.1.dev169

Toggle v0.5.1.dev169's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
bump prime cli to 0.6.4 (#2480)