-
Notifications
You must be signed in to change notification settings - Fork 765
Description
Motivation: Why do you think this is important?
Right now the configuration around GPU accelerated workloads is quite rigid. Flyte propeller only allows for a global GPU resource name to be configured which makes impossible to use Flyte with data planes that have heterogenous GPU resource vendors/names.
Additionally, it is assumed that all GPU accelerated compute nodes will require a singular node selector label or taint key/value. In our environment our compute nodes contain multiple taints.
We have worked around some of these issues using pod templates but we hit a blocker when we needed to support fractionalized GPUs with nvidia.com/gpu.shared.
Goal: What should the final outcome look like, ideally?
A more flexible configuration for GPU accelerated workloads where custom resource names, node selectors, and tolerations can be configured.
Describe alternatives you've considered
None
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
- Yes
Have you read the Code of Conduct?
- Yes
Metadata
Metadata
Assignees
Labels
Type
Projects
Status