KAI Scheduler

KAI Scheduler is a robust, efficient, and scalable Kubernetes scheduler that optimizes GPU resource allocation for AI and machine learning workloads.

Designed to manage large-scale GPU clusters, including thousands of nodes, and high-throughput of workloads, makes the KAI Scheduler ideal for extensive and demanding environments. KAI Scheduler allows administrators of Kubernetes clusters to dynamically allocate GPU resources to workloads.

KAI Scheduler supports the entire AI lifecycle, from small, interactive jobs that require minimal resources to large training and inference, all within the same cluster. It ensures optimal resource allocation while maintaining resource fairness between the different consumers. It can run alongside other schedulers installed on the cluster.

Latest News 🔥

[2025/11] KubeCon NA 2025 Talk: Watch the recording of the presentation "Lightning Talk: Mind the Topology: Smarter Scheduling for AI Workloads on Kubernetes" to learn how KAI's Topology-Aware Scheduling (TAS) optimizes placement for modern disaggregated serving architectures.
[2025/11] Integration with Grove & Dynamo: KAI's Topology-Aware and Hierarchical Gang Scheduling capabilities are integrated with Grove to orchestrate complex, multi-component workloads like disaggregated serving and agentic pipelines at scale. Read the blog post for more details.
[2025/10] v0.10.0 Release: Major features released, including Topology-Aware Scheduling (TAS), Hierarchical PodGroups, and Time-aware Fairness.
[2025/10] KubeRay Integration: KAI Scheduler is now natively integrated for Ray workloads on Kubernetes.
[2025/08] Time-Aware Fair-Sharing: Proposal for Time-aware fair-sharing is discussed at batch-wg. Watch the recording.
[2025/04] Project Introduction: Recording of the KAI Scheduler introduction presented at the batch-wg meeting.

Key Features

Batch Scheduling: Ensure all pods in a group are scheduled simultaneously or not at all.
Bin Packing & Spread Scheduling: Optimize node usage either by minimizing fragmentation (bin-packing) or increasing resiliency and load balancing (spread scheduling).
Workload Priority: Prioritize workloads effectively within queues.
Hierarchical Queues: Manage workloads with two-level queue hierarchies for flexible organizational control.
Resource distribution: Customize quotas, over-quota weights, limits, and priorities per queue.
Fairness Policies: Ensure equitable resource distribution using Dominant Resource Fairness (DRF) and resource reclamation across queues.
Workload Consolidation: Reallocate running workloads intelligently to reduce fragmentation and increase cluster utilization.
Elastic Workloads: Dynamically scale workloads within defined minimum and maximum pod counts.
Dynamic Resource Allocation (DRA): Support vendor-specific hardware resources through Kubernetes ResourceClaims (e.g., GPUs from NVIDIA or AMD).
GPU Sharing: Allow multiple workloads to efficiently share single or multiple GPUs, maximizing resource utilization.
Cloud & On-premise Support: Fully compatible with dynamic cloud infrastructures (including auto-scalers like Karpenter) as well as static on-premise deployments.

Note

KAI Scheduler is built based on kube-batch.

Prerequisites

Before installing KAI Scheduler, ensure you have:

A running Kubernetes cluster
Helm CLI installed
NVIDIA GPU-Operator installed in order to schedule workloads that request GPU resources

Installation

KAI Scheduler will be installed in kai-scheduler namespace.

⚠️ When submitting workloads, make sure to use a dedicated namespace. Do not use the kai-scheduler namespace for workload submission.

Installation Methods

KAI Scheduler can be installed:

From Production (Recommended)
From Source (Build it Yourself)

Install from Production

Locate the latest release version in releases page. Run the following command after replacing <VERSION> with the desired release version:

helm upgrade -i kai-scheduler oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler -n kai-scheduler --create-namespace --version <VERSION>

Build from Source

Follow the instructions here

Flavor Specific Instructions

Openshift

When gpu-operator <v25.10.0 is installed, the following flag should be added to the installation command:

--set admission.gpuPodRuntimeClassName=null

Support & Breaking changes

For details on our release lifecycle, LTS versions, and supported releases, see the Support Policy.

Refer to the Breaking Changes doc for more info

Quick Start

To start scheduling workloads with KAI Scheduler, please continue to Quick Start example

Roadmap

High-level overview of the main priorities for 2025

Refactor the codebase to enhance vendor neutrality #134
Support Scheduling Gates #63
Research on possible integration with Kueue #68
Add Topology Aware Scheduling support of pod-group #66
Support Min Run Time per workloads #136
Support Max Run Time per workload (with delayed requeue)
Add more PriorityClasses as part of the default KAI install
Support JobSet
Support LWS (LeaderWorkerSet) #124
Add metrics for pod and pod-group preemptions
Decouple Priority and Preemption

Long term goals

Support per queue time decay
Hyper scale improvements
Support Consolidation of Inference workloads for cluster defragmentation
Support n-levels of hierarchical queues
Graceful rollout of Inference workloads (new revision update using queue temporary over-quota)

Community, Discussion, and Support

We’d love to hear from you! Here are the best ways to connect:

Contributing

Contributions are encouraged and appreciated! Please have a look at KAI-scheduler's contribution guide before submitting PRs.

Slack

Join the CNCF Slack first and visit the #kai-scheduler channel.

Bi-weekly Community Call

When: Every other Monday at 17:00 CEST
Convert to your time zone | Add to your calendar | Meeting notes & agenda

Mailing List

Join the kai-scheduler mailing list to receive updates on biweekly meetings.

Technical Issues & Feature Requests

Please open a GitHub issue for bugs, feature suggestions, or technical help. This helps us keep track of requests and respond effectively.

Name		Name	Last commit message	Last commit date
Latest commit History 455 Commits
.github		.github
build		build
cmd		cmd
deployments		deployments
docs		docs
hack		hack
pkg		pkg
test/e2e		test/e2e
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLA.md		CLA.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
Makefile		Makefile
NOTICE		NOTICE
OWNERS		OWNERS
README.md		README.md
SUPPORT.md		SUPPORT.md
code_of_conduct.md		code_of_conduct.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KAI Scheduler

Latest News 🔥

Key Features

Prerequisites

Installation

Installation Methods

Install from Production

Build from Source

Flavor Specific Instructions

Openshift

Support & Breaking changes

Quick Start

Roadmap

High-level overview of the main priorities for 2025

Long term goals

Community, Discussion, and Support

Contributing

Slack

Bi-weekly Community Call

Mailing List

Technical Issues & Feature Requests

About

Uh oh!

Releases 78

Packages

Uh oh!

Uh oh!

Contributors 37

Languages

License

NVIDIA/KAI-Scheduler

Folders and files

Latest commit

History

Repository files navigation

KAI Scheduler

Latest News 🔥

Key Features

Prerequisites

Installation

Installation Methods

Install from Production

Build from Source

Flavor Specific Instructions

Openshift

Support & Breaking changes

Quick Start

Roadmap

High-level overview of the main priorities for 2025

Long term goals

Community, Discussion, and Support

Contributing

Slack

Bi-weekly Community Call

Mailing List

Technical Issues & Feature Requests

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 78

Packages 0

Uh oh!

Uh oh!

Contributors 37

Languages

Packages