Skip to content
@CoHDI

CoHDI

Composable Hardware Disaggregated Infrastructure

CoHDI Project: Vision Statement

The objective is to foster a community-driven, standards-based open ecosystem for next-generation architectures and frameworks built on Composable Hardware Disaggregated Infrastructure (CoHDI, pronounced "Cody"), also known as Composable Disaggregated Infrastructure.
CoHDI enables data center operators to realize the benefits of cost efficiency, high availability, and sustainability through a disaggregated computing system. However, a gap still exists between Kubernetes and CoHDI, making it challenging to achieve more dynamic composability within the Kubernetes cloud-native environment. The DDS project and Composable Resource Operator project aim to bridge this gap by collaborating with Dynamic Resource Allocation, sig-node, sig-autoscaling and sig-scheduling projects.

How it works

The CoHDI system consists of a hardware-disaggregated resource pool and the Composable Manager (CoHDI Manager) software. Within the resource pool, all components are interconnected via PCIe or CXL switches. The CoHDI Manager orchestrates these switches to dynamically compose bare-metal hardware servers through software-defined configurations. It provides a Composable Resource API, which can be accessed by either the Composable Resource Operator or Kubernetes API.

image

K8s Internal Operation

How Dynamic Device Scaler Works

  • When we use current DRA, it checks and lists all attached devices in worker nodes to Resource slice. (1)
  • We introduce new kind of resource slice for free devices (e.g. GPU) in resource pool. Composable-dra-driver checks the free devices in resource pool and lists them in the resource slice. (1)
  • Now we assume user creates a new Pod requesting a non-existing GPU in worker nodes. (2)
  • When scheduler tries to schedule the Pod and finds the GPU in Resource Slice for resource pool is available, scheduler waits to schedule the Pod. (3-1, 3-2, 4)
  • After that , when Dynamic-device-scaler detects this situation, it requests to attach GPU through composabile-resource- operator custom resource. (5-1, 5-2)
  • Composable-resource-operator requests attachment of GPU to rest API of CDI system. (6-1)
  • Then Composable Hardware Dissagregated Infrastructure Manager controls PCI switch and attach a GPU to a worker node. (6-2)
  • Once GPU is attached, vendor DRA plugin adds the GPU to Resource slice. (1)
  • Finally the Pod is scheduled using attached GPU.

For more detailed information on each component, please refer to its respective repository in the CoHDI project.

See also KEP-5007.

How CoHDI works:

how cohdi works

GPU Hot-Add Demonstration: A pod request triggers an increase in the number of GPUs attached to a node, from 1 to 2:

demo_hotadd

GPU Hot-Remove Demonstration: Pod deletion triggers a decrease in the number of GPUs attached to a node, from 2 to 1:

demo_hodremove

Related Information

These are enhancement description for K8s scheduler.

For alpha release: KEP-5007

For beta release: KEP-5007

Adopters

CoHDI Adopters

Slack Channel

CoHDI Slack Channel

CoHDI Repositories

Composable DRA Driver

Dynamic Device Scaler

Composable Resource Operator

.github

Popular repositories Loading

  1. composable-resource-operator composable-resource-operator Public

    Proof Of Concept showcasing composable GPUs in Kubernetes

    Go 14 4

  2. .github .github Public

    CoHDI - Composable Hardware Disaggregated Infrastructure

    6 1

  3. dynamic-device-scaler dynamic-device-scaler Public

    Go 4 3

  4. composable-dra-driver composable-dra-driver Public

    Go 3 2

  5. cohdi-manager-mock cohdi-manager-mock Public

    Python 1 1

Repositories

Showing 5 of 5 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…