Skip to content

Conversation

@vinaykul
Copy link
Member

@vinaykul vinaykul commented Mar 4, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it: In-place pod resize checkpoint store code panics when standalone kubelet attempts to start. The current code has nil pointer access bug. This PR fixes the issue.

Which issue(s) this PR fixes: #116262

Fixes #116262

Special notes for your reviewer:

Testing done:

  1. Hacky way but not initializing checpoint manager
  2. Pods come up normally with local cluster.
  3. Verified local cluster pod resize E2E tests work.
  4. Unit tests

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


…tion checkpoint manager in standalone kubelet scenario
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 4, 2023
@k8s-ci-robot k8s-ci-robot requested review from bart0sh and yujuhong March 4, 2023 08:17
@vinaykul
Copy link
Member Author

vinaykul commented Mar 4, 2023

otherPods = append(otherPods, op)
attrs.OtherPods = otherPods
} else {
klog.ErrorS(nil, "pod resource allocation checkpoint manager is not initialized.")
Copy link
Member

@liggitt liggitt Mar 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If checkpoint state is nil in non-error cases (like standalone kubelet) we can't log errors on every pod sync. We'll flood the error log with these otherwise.

Is there a way we can make the VPA feature completely inactive/inert for standalone kubelets, rather than sprinkling nil checks throughout, since a kubelet running that way won't be getting pod updates from the API that change resources?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and catching this issue. I think I can add a couple of Get interfaces to status manager (similar pattern to GetPodStatus) and avoid these nil checks. I'll work on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, how did you find this issue? Is this it: https://github.com/kelseyhightower/standalone-kubelet-tutorial ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you find this issue?

we (GKE) run standalone kubelets, and our CI started failing once the VPA change merged to master

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can add a couple of Get interfaces to status manager (similar pattern to GetPodStatus) and avoid these nil checks

Does the VPA feature make sense for a standalone kubelet? I don't think it does. It would be easier to reason about if standalone kubelets didn't ever hit VPA code paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something like this:

mkdir -p /var/run/kubernetes/static-pods

echo "
kind: Pod
apiVersion: v1
metadata:
  name: busybox
spec:
  containers:
  - name: busybox
    image: busybox
" > /var/run/kubernetes/static-pods/busybox.yaml

START_MODE=kubeletonly FEATURE_GATES=InPlacePodVerticalScaling=true hack/local-up-cluster.sh

but I bet local-up-cluster.sh has atrophied and START_MODE=kubeletonly won't work quite right

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

START_MODE=kubeletonly FEATURE_GATES=InPlacePodVerticalScaling=true hack/local-up-cluster.sh


but I bet local-up-cluster.sh has atrophied and START_MODE=kubeletonly won't work quite right

Thanks, that works. Needs one additional envvar:
POD_MANIFEST_PATH=/var/run/kubernetes/static-pods START_MODE=kubeletonly FEATURE_GATES=InPlacePodVerticalScaling=true hack/local-up-cluster.sh

Anyways, I have another commitment I need to get to. I'll verify this with the above suggested optimization later in the day today.

Copy link
Member

@bobbypage bobbypage Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jordan for raising this, AFAIU I don't think we have good coverage of standalone kubelet in CI today. I think we could think about extending node e2e to also support a mode where we spin up kubelet in standalone mode specifically and only run static pods as a few tests.

In general, it may help as well to have an environment we can dedicate to static pod related tests since we have a few issues there that @smarterclayton and others been trying to tackle recently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking more closely, kubeletonly mode conveniently brings up just kubelet (POD_MANIFEST_PATH default works) but kubeClient is created. But, I was able to set kubeDeps.KubeClient = nil just before NewMainKubelet and get a repro to verify the before & after.

We don't need a new check for kubeClient==nil before invoking handlePodResourcesResize because !kubetypes.IsStaticPod(pod) check achieves the same goal.

@bart0sh
Copy link
Contributor

bart0sh commented Mar 5, 2023

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 5, 2023
@valaparthvi
Copy link

/cc

@k8s-ci-robot k8s-ci-robot requested a review from valaparthvi March 6, 2023 05:38
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 6, 2023
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 6, 2023
…d Resize values, remove error logging for valid standalone kubelet scenario
@vinaykul vinaykul force-pushed the restart-free-pod-vertical-scaling-kubelet-panic-fix branch from 531263c to b0dce92 Compare March 6, 2023 09:51
@vinaykul
Copy link
Member Author

vinaykul commented Mar 6, 2023

/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2


// GetContainerResourceAllocation returns the last checkpointed ResourcesAllocated values
// If checkpoint manager has not been initialized, it returns nil, false
func (m *manager) GetContainerResourceAllocation(podUID string, containerName string) (v1.ResourceList, bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check nil and return not found before locking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state will still be nil in all of these functions if the feature gate is disabled.

sweeping all the callers, it's pretty twisty to tell that all of the call sites are guarded by feature gate checks (these are sometimes called from functions which are only called under feature gate guard, or are called from declared closures which are then invoked inside a feature gate guard)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR is an improvement on current state of master, but this still seems like a sharp edge to improve before 1.27 cut

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I can add feature gates check in these functions as a conservative check that makes it easier for future reviews vs. adding a nil check and forgetting to remove at GA. It is one extra compare and branch instruction we don't need on every pod sync (if we can count on pull & CI tests).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR is an improvement on current state of master, but this still seems like a sharp edge to improve before 1.27 cut

What other things do you want to see fixed? I initially wanted to move away from node local checkpoint in favor of relying on values persisted in PodStatus (if it is legit use of KEP 2527), but that wouldn't fly in standalone kubelet scenario. I feel kubelet could use a more generic KV abstraction that better decouples the checkpointing implementation details, but that is a bit larger scope work.. jmho.

Copy link
Member

@liggitt liggitt Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not leaving those manager methods NPE landmines if the feature-gate is disabled? either doing nil checks in those functions, or assigning a dummy/no-op impl to state when the feature gate is off

but a follow-up is fine... I'm more concerned with getting alpha standalone kubelets green again at this point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no-op is a good idea. Since this fix is urgent and already LTGM'd, I'll create a separate PR follow-up or tack it on to an existing follow-up PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does PR #116351 get it done?

@pacoxu
Copy link
Member

pacoxu commented Mar 7, 2023

/test pull-kubernetes-e2e-gce-cos-alpha-features

@vinaykul
Copy link
Member Author

vinaykul commented Mar 7, 2023

/test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2

@liggitt
Copy link
Member

liggitt commented Mar 7, 2023

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 7, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f1154edb10756ad19d1cfe4a9bf49248c16118fa

@liggitt
Copy link
Member

liggitt commented Mar 7, 2023

/hold for testing if desired

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 7, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, vinaykul

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 7, 2023
@vinaykul
Copy link
Member Author

vinaykul commented Mar 7, 2023

@liggitt
Copy link
Member

liggitt commented Mar 7, 2023

/hold cancel

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 7, 2023
@liggitt liggitt added this to the v1.27 milestone Mar 7, 2023
@k8s-ci-robot k8s-ci-robot merged commit 6bce018 into kubernetes:master Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

VPA panics in standalone kubelets

10 participants