Skip to content

[FG:InPlacePodVerticalScaling] Metrics #131648

@natasha41575

Description

@natasha41575

We need to propose and instrument metrics for https://kep.k8s.io/1287. We get the "total resize request count" (apiserver_request_total{resource=pods,subresource=resize}) for free through the api server, but some additional metrics might be useful such as:

  • resize requests at the pod level (taken as an aggregate across all containers), by whether the request is for cpu or memory, and by whether it is an increase or decrease
  • the same as the previous one, but at the container level
  • the latency between when a resize is marked as in progress and when it completes
  • if a resize is infeasible, the reason
  • resize actuation error count / error rate ???
  • after [FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop #131612 merges, how often a resize request is accepted through the periodic retry as opposed to being explicitly signaled (this indicates that we missed something and there is unnecessary latency in retrying the deferred resizes)

Metadata

Metadata

Assignees

Labels

kind/featureCategorizes issue or PR as related to a new feature.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.sig/instrumentationCategorizes an issue or PR as relevant to SIG Instrumentation.sig/nodeCategorizes an issue or PR as relevant to SIG Node.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions