Skip to content

Conversation

@RoshaniN
Copy link
Collaborator

@RoshaniN RoshaniN commented Mar 17, 2025

  1. Introduced PathwaysJob Status tracking to be 1:1 with JobSet status tracking.
  2. Moved from GKE accelerator type to machine type and updated topology validations around it.
  3. Simplified the deployment models so that Pathways RM, proxy and user job are now bundled into one pod called the Pathways head. For colocate mode, Pathways head is deployed on TPUs and for default mode, Pathways head is deployed on CPUs (controlled using affinity and tolerations).
  4. Detailed the status tracking further to retain a timeline of conditions and persisted the changes.

Major focus area - code in pathwaysjob_controller.go and pathwaysjob_types.go (other files are automatically modified.)
Verification - b/400592364

@RoshaniN RoshaniN changed the title PathwaysJob status tracking [DRAFT] PathwaysJob status tracking Mar 17, 2025
@RoshaniN RoshaniN force-pushed the tests branch 2 times, most recently from 1a5ee63 to 99695f6 Compare March 18, 2025 06:05
@RoshaniN RoshaniN changed the title [DRAFT] PathwaysJob status tracking PathwaysJob status tracking, implementing Pathways head pod. Mar 19, 2025
Copy link
Collaborator

@SujeethJinesh SujeethJinesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks good to me and nothing specifically seems wrong, but I definitely think Shaurya's review will be good to have.

Copy link
Collaborator

@shauryagup shauryagup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Roshani. Mostly just minor comments here. I was wondering if you could generate a YAML for each of the deployments modes and share with me offline, so I could also spot check them for any possible issues?

@RoshaniN
Copy link
Collaborator Author

Thanks Roshani. Mostly just minor comments here. I was wondering if you could generate a YAML for each of the deployments modes and share with me offline, so I could also spot check them for any possible issues?

Thanks! Absolutely, shared the YAMLs on the bug as well. PLMK if you spot any issues!

@RoshaniN RoshaniN merged commit 17311cd into main Mar 26, 2025
1 check passed
@RoshaniN RoshaniN deleted the tests branch March 26, 2025 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants