-
Notifications
You must be signed in to change notification settings - Fork 41.6k
[FG:InPlacePodVerticalScaling] Handle systemd cgroup driver by using libcontainer for updating pod cgroup values #124216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Skipping CI for Draft Pull Request. |
|
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Sun Apr 7 07:41:51 UTC 2024. |
|
/test pull-kubernetes-conformance-kind-ga-only-parallel /test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2 |
|
/test pull-kubernetes-node-e2e-containerd |
|
/test ? |
|
@iholder101: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cc @tallclair @vinaykul |
|
/triage accepted |
f428c16 to
1622026
Compare
|
/test pull-kubernetes-node-e2e-containerd |
Nice catch! |
|
As I tried this fix locally, the CPU quota of a pod cgroup was updated to It looks that libcontainer updates
So, we should set kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go Lines 677 to 684 in 1bbe775
Otherwise, CPUQuota will be updated to max when CPUShares is updated at resizing a pod cgroup.
|
a3a2a11 to
e80235b
Compare
Signed-off-by: Itamar Holder <iholder@redhat.com>
libcontainer's cgroup manager is version agnostic, and is agnostic to whether systemd is used. This way if systemd is used, the cgroup manager would be able to update resources properly so that if the daemon would be restarted the changes would not be reverted. Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
e80235b to
590cc46
Compare
Great catch! Thanks a lot @hshiina! |
Signed-off-by: Itamar Holder <iholder@redhat.com>
590cc46 to
f21473b
Compare
| } | ||
| } | ||
| if resourceConfig.CPUShares != nil { | ||
| cpuWeight := cpuSharesToCPUWeight(*resourceConfig.CPUShares) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to preserve this conversion. I think this can just be done in the common path, so both CPU shares & CPU weight are set, and libcontainer just uses the correct one depending on if it's cgroups v1 or v2
| // NOTE: .CpuShares is not used here. Conversion is the caller's responsibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @tallclair!
I believe this conversion already takes place in the current state of this PR.
The setCgroupCPUConfig is now unified here: https://github.com/kubernetes/kubernetes/pull/124216/files#diff-2f332b11d91d53583a6eeca75fc6e124aee6bc547a264e6987b89b1c786e0674R257.
In turn, the Update() function uses libctCgroupConfig() to convert the cgroup config to a runc struct:
kubernetes/pkg/kubelet/cm/cgroup_manager_linux.go
Lines 358 to 364 in b845137
| func (m *cgroupCommon) Update(cgroupConfig *CgroupConfig) error { | |
| start := time.Now() | |
| defer func() { | |
| metrics.CgroupManagerDuration.WithLabelValues("update").Observe(metrics.SinceInSeconds(start)) | |
| }() | |
| libcontainerCgroupConfig := m.libctCgroupConfig(cgroupConfig, true) |
And indeed this function handles the conversion between cpu shares to cpu weight:
kubernetes/pkg/kubelet/cm/cgroup_manager_linux.go
Lines 199 to 204 in b845137
| func (m *cgroupCommon) libctCgroupConfig(in *CgroupConfig, needResources bool) *libcontainerconfigs.Cgroup { | |
| config := &libcontainerconfigs.Cgroup{ | |
| Systemd: m.useSystemd, | |
| } | |
| if needResources { | |
| config.Resources = m.toResources(in.ResourceParameters) |
kubernetes/pkg/kubelet/cm/cgroup_manager_linux.go
Lines 274 to 291 in b845137
| func (m *cgroupCommon) toResources(resourceConfig *ResourceConfig) *libcontainerconfigs.Resources { | |
| resources := &libcontainerconfigs.Resources{ | |
| SkipDevices: true, | |
| SkipFreezeOnSet: true, | |
| } | |
| if resourceConfig == nil { | |
| return resources | |
| } | |
| if resourceConfig.Memory != nil { | |
| resources.Memory = *resourceConfig.Memory | |
| } | |
| if resourceConfig.CPUShares != nil { | |
| if libcontainercgroups.IsCgroup2UnifiedMode() { | |
| resources.CpuWeight = getCPUWeight(resourceConfig.CPUShares) | |
| } else { | |
| resources.CpuShares = *resourceConfig.CPUShares | |
| } | |
| } |
kubernetes/pkg/kubelet/cm/cgroup_manager_linux.go
Lines 259 to 267 in b845137
| func getCPUWeight(cpuShares *uint64) uint64 { | |
| if cpuShares == nil { | |
| return 0 | |
| } | |
| if *cpuShares >= 262144 { | |
| return 10000 | |
| } | |
| return 1 + ((*cpuShares-2)*9999)/262142 | |
| } |
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks. I missed that. Looks good then.
|
/lgtm |
|
LGTM label has been added. Git tree hash: fb92c006d96019ec5848b889171e8a812c0882e4
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: iholder101, tallclair The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
/sig node
What this PR does / why we need it:
After this PR, libcontainer's cgroup manager is used in order to update pod resources instead of writing directly to cgroupfs.
Further to solving #113857 by avoiding pod updates to be overridden when systemd daemon reloads, it simplifies the code and delegates the heavy lifting to libcontainer's cgroup manager.
Which issue(s) this PR fixes:
Fixes #113857
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: