Skip to content

Conversation

@JesseStutler
Copy link
Member

@JesseStutler JesseStutler commented Jul 14, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Inspired by #4426, in the scenario of hierarchical queues, it is hoped that the following capabilities can be supported:
The parent queue is associated with the nodegroup, and the jobs submitted by the child queue are automatically associated with the nodegroup associated with the parent queue. Therefore we can implement it in this way: if a child queue has no affinity, it will inherit the affinity configuration from the nearest ancestor queue

Which issue(s) this PR fixes:

Fixes #4426

Special notes for your reviewer:

Validation:
We have such queues: [root, q1, q2, q2-sub]
The parent of q1 and q2 is the root queue. Neither q1 nor q2-sub specifies affinity. The affinity specified by the root queue is:

    affinity: 
      nodeGroupAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - groupname1
        - groupname2
        preferredDuringSchedulingIgnoredDuringExecution:
        - groupname1
      nodeGroupAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - groupname3
        - gropuname4
        preferredDuringSchedulingIgnoredDuringExecution:
        - groupname3

, and the affinity and antiaffinity of q2 are completely opposite to those of the root. You can see the printout when the nodegroup plugin OnSessionOpen is:
, q1 inherits the configuration of the root, while q2-sub inherits the configuration of q2
image

Besides, this PR should also be merged: volcano-sh/apis#186, otherwise the root queue will miss the affinity field even if I add it

Does this PR introduce a user-facing change?

The nodegroup plugin adds the `enableHierarchy` configuration. If enableHierarchy is set to true, the hierarchical queue configuration will be enabled. A child queue without a specified affinity will inherit the affinity configuration of the nearest ancestor queue.

Co-authored-by: googs1025 <googs1025@gmail.com>
Signed-off-by: JesseStutler <chenzicong4@huawei.com>
@volcano-sh-bot volcano-sh-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 14, 2025
@volcano-sh-bot volcano-sh-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 14, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @JesseStutler, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the nodegroup scheduler plugin by implementing comprehensive support for hierarchical queues. This feature enables child queues to automatically inherit node group affinity settings from their parent or nearest ancestor queues when they lack explicit configurations, streamlining the management of complex queue structures and ensuring efficient resource allocation based on inherited policies. The changes involve core logic updates, new configuration options, and improved testing utilities.

Highlights

  • Hierarchical Queue Support: Introduced the capability for the nodegroup plugin to support hierarchical queues, allowing child queues to inherit node group affinity configurations from their nearest ancestor if they do not specify their own.
  • Configuration Option: Added a new enableHierarchy configuration option to the nodegroup plugin. When set to true, this option activates the hierarchical queue functionality, enabling affinity inheritance.
  • Refactored Affinity Logic: The internal representation and processing of queue group affinities within the nodegroup plugin have been significantly refactored. This includes new data structures (queueAttr, queueGroupAffinity) and helper functions (initQueueAttrs, buildAncestors, updateAffinityFromAncestor) to manage and apply hierarchical affinity rules.
  • Centralized Hierarchy Check: A new HierarchyEnabled method was added to the framework.Session to centralize the check for whether a specific plugin has hierarchical queue support enabled. This promotes consistency and reusability across plugins.
  • Improved Error Handling: Dedicated error constants (errNodeGroupAffinityNotFound, errNodeGroupLabelNotFound, errNodeUnsatisfied) and a helper function (newFitErr) were introduced in a new errors.go file within the nodegroup plugin. This provides more specific and consistent error messages for node group affinity failures.
  • Enhanced Testing Utilities: A new QueueWrapper fluent builder was added to pkg/scheduler/util/test_utils.go. This utility simplifies the creation and configuration of schedulingv1beta1.Queue objects in tests, improving test readability and maintainability, and enabling comprehensive testing of the new hierarchical features.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces hierarchical queue support for the nodegroup plugin. The changes are well-structured, but there are two main issues that should be addressed: a critical logic error in the predicate function that could lead to incorrect scheduling, and a high-severity issue regarding the lack of cycle detection in the recursive hierarchy-building logic, which could cause the scheduler to crash.

Comment on lines 183 to 219
func (np *nodeGroupPlugin) buildAncestors(ssn *framework.Session, attr *queueAttr) {
if attr.queueID == rootQueueID {
return
}

queueInfo := ssn.Queues[attr.queueID]
parentID := api.QueueID(rootQueueID)
if queueInfo.Queue.Spec.Parent != "" {
parentID = api.QueueID(queueInfo.Queue.Spec.Parent)
}

parentInfo := ssn.Queues[parentID]
if parentInfo == nil {
klog.Warningf("Parent queue %s not found for queue %s, unable to build hierarchy.", queueInfo.Queue.Spec.Parent, queueInfo.Name)
return
}

parentAttr, found := np.queueAttrs[parentInfo.UID]
if !found {
klog.Warningf("Parent queue attribute not found for %s, which should not happen.", parentInfo.Name)
return
}
if _, ok := task.Pod.Annotations[batch.QueueNameKey]; ok {
return task.Pod.Annotations[batch.QueueNameKey]

// Recursively build ancestors for the parent queue.
np.buildAncestors(ssn, parentAttr)

// A queue's ancestors are its parent plus all the parent's ancestors.
attr.ancestors.PushBack(parentAttr)
attr.ancestors.PushBackList(parentAttr.ancestors)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The buildAncestors function uses recursion to build the queue hierarchy. If a cyclic dependency exists in the queue configuration (e.g., queue 'A' is the parent of 'B', and 'B' is the parent of 'A'), this will result in infinite recursion and a stack overflow, potentially crashing the scheduler. To prevent this, add cycle detection by tracking visited queues during recursion. This can be achieved by passing a map to track the queues in the current recursion path and checking for cycles before recursing.

func (np *nodeGroupPlugin) buildAncestors(ssn *framework.Session, attr *queueAttr, visited map[api.QueueID]bool) {
	if visited == nil {
		visited = make(map[api.QueueID]bool)
	}
	if _, ok := visited[attr.queueID]; ok {
		klog.Warningf("Cycle detected in queue hierarchy at queue %s, skipping ancestor building.", attr.queueID)
		return
	}

	visited[attr.queueID] = true
	defer delete(visited, attr.queueID)

	if attr.queueID == rootQueueID {
		return
	}

	queueInfo := ssn.Queues[attr.queueID]
	parentID := api.QueueID(rootQueueID)
	if queueInfo.Queue.Spec.Parent != "" {
		parentID = api.QueueID(queueInfo.Queue.Spec.Parent)
	}

	parentInfo := ssn.Queues[parentID]
	if parentInfo == nil {
		klog.Warningf("Parent queue %s not found for queue %s, unable to build hierarchy.", queueInfo.Queue.Spec.Parent, queueInfo.Name)
		return
	}

	parentAttr, found := np.queueAttrs[parentInfo.UID]
	if !found {
		klog.Warningf("Parent queue attribute not found for %s, which should not happen.", parentInfo.Name)
		return
	}

	// Recursively build ancestors for the parent queue.
	np.buildAncestors(ssn, parentAttr, visited)

	// A queue's ancestors are its parent plus all the parent's ancestors.
	attr.ancestors.PushBack(parentAttr)
	attr.ancestors.PushBackList(parentAttr.ancestors)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's a good suggestion, although potentially users won't configure the cyclic dependency queues

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@JesseStutler
Copy link
Member Author

Besides, this PR should also be merged: volcano-sh/apis#186, otherwise the root queue will miss the affinity field even if I add it

Signed-off-by: JesseStutler <chenzicong4@huawei.com>
@JesseStutler JesseStutler force-pushed the nodegroup_hierarchy branch from 318458a to 268703a Compare July 15, 2025 12:53
@JesseStutler
Copy link
Member Author

/cc @Monokaix @wangyang0616

@JesseStutler
Copy link
Member Author

/cc @kingeasternsun

func GetPodQueue(task *api.TaskInfo) string {
if _, ok := task.Pod.Labels[batch.QueueNameKey]; ok {
return task.Pod.Labels[batch.QueueNameKey]
func (np *nodeGroupPlugin) buildAncestors(ssn *framework.Session, attr *queueAttr, visited map[api.QueueID]struct{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m wondering if it’s really necessary to maintain Ancestors eagerly. We could just perform a recursive lookup when needed, since the queue hierarchy is unlikely to be very deep.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because when the Child queue has affinity , just use it's own affinity (no need inheritance).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's more simple in this way, I think this can also be accepted.

#### Inheritance Rules
1. **Direct Inheritance**: A child queue without affinity configuration inherits from its immediate parent queue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“most recent ancestor” is better.

@Monokaix
Copy link
Member

Will fire another pr to fix some minor nits, we can merge this first, as @JesseStutler is busy for other work. @kingeasternsun

@Monokaix
Copy link
Member

/approve

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Monokaix

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 23, 2025
@kingeasternsun
Copy link
Contributor

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 23, 2025
@kingeasternsun
Copy link
Contributor

Will fire another pr to fix some minor nits, we can merge this first, as @JesseStutler is busy for other work. @kingeasternsun

ok

@volcano-sh-bot volcano-sh-bot merged commit 321389d into volcano-sh:master Sep 23, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hierarchical queues compatible with Nodegroup

4 participants