Skip to content

Fix two-stage weight update server lookup#10

Draft
cursor[bot] wants to merge 1 commit into
workingbranchfrom
cursor/critical-bug-inspection-a4c8
Draft

Fix two-stage weight update server lookup#10
cursor[bot] wants to merge 1 commit into
workingbranchfrom
cursor/critical-bug-inspection-a4c8

Conversation

@cursor

@cursor cursor Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Fix TwoStagevLLMRollout control-method actor lookup to use the upstream vLLM server actor name.
  • Add a regression test for first-time abort/resume server calls during weight update control flow.

Validation

  • python3 -m unittest tests.test_two_stage_vllm_rollout

Bug and impact

The recent pre-weight-update abort path called a nonexistent _get_server_name_prefix() on TwoStagevLLMRollout. On rank 0, the first update_weights() call would crash before syncing rollout weights, interrupting training at the first weight update.

Open in Web View Automation 

Co-authored-by: XU Mingshi <mxuax@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant