Skip to content

[Core feature] Greater flexibility with GPUs accelerated workloads #6743

@Sovietaced

Description

@Sovietaced

Motivation: Why do you think this is important?

Right now the configuration around GPU accelerated workloads is quite rigid. Flyte propeller only allows for a global GPU resource name to be configured which makes impossible to use Flyte with data planes that have heterogenous GPU resource vendors/names.

Additionally, it is assumed that all GPU accelerated compute nodes will require a singular node selector label or taint key/value. In our environment our compute nodes contain multiple taints.

We have worked around some of these issues using pod templates but we hit a blocker when we needed to support fractionalized GPUs with nvidia.com/gpu.shared.

Goal: What should the final outcome look like, ideally?

A more flexible configuration for GPU accelerated workloads where custom resource names, node selectors, and tolerations can be configured.

Describe alternatives you've considered

None

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions