Skip to content

Conversation

@pacoxu
Copy link
Member

@pacoxu pacoxu commented Mar 17, 2023

What type of PR is this?

/kind flake

for data race

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #116696

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/flake Categorizes issue or PR as related to a flaky test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 17, 2023
@pacoxu pacoxu force-pushed the deflake-kubemark-data-race branch from 1028c12 to 7dcc5ad Compare March 17, 2023 06:02
@pacoxu
Copy link
Member Author

pacoxu commented Mar 17, 2023

/cc @bobbypage @liggitt @vinaykul

@pacoxu
Copy link
Member Author

pacoxu commented Mar 17, 2023

13m40s: 1048 runs so far, 0 failures
32m55s: 2529 runs so far, 0 failures

I stress in my env.

@pacoxu
Copy link
Member Author

pacoxu commented Mar 17, 2023

/priority critical-urgent
/triage accepted

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 17, 2023
@pacoxu
Copy link
Member Author

pacoxu commented Mar 17, 2023

/assign @smarterclayton

@pacoxu pacoxu force-pushed the deflake-kubemark-data-race branch from 7dcc5ad to 7bd7523 Compare March 17, 2023 07:45
@mborsz
Copy link
Member

mborsz commented Mar 17, 2023

/lgtm

Thank you!

@liggitt liggitt changed the title kubelet: use filepath.Clean before init, validate it in setupDataDirs kubelet: fix data races Mar 17, 2023
@pacoxu pacoxu force-pushed the deflake-kubemark-data-race branch from f1b045e to 34d38b0 Compare March 17, 2023 13:02
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 17, 2023
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinaykul

Follow-up for a separate PR, but if the statusManager.SetPodResizeStatus call fails, it looks like we're still assigning to the pod (pod.Status.Resize = resizeStatus) and passing the updated pod to podManager (kl.podManager.UpdatePod(pod)). Is that right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It ends up doing nothing (resize fails) last I tried simulating failure. I updated PR #116702 to explicitly return for now so it is more obvious. There's a TODO to investigate if this can be handled better. I have some ideas but I need to really understand Clayton's plans to see if we can work something out to make it more resilient.
/cc @smarterclayton

@liggitt
Copy link
Member

liggitt commented Mar 17, 2023

lgtm, will let kubelet approver ack

/retest

Signed-off-by: Paco Xu <paco.xu@daocloud.io>
@pacoxu pacoxu force-pushed the deflake-kubemark-data-race branch from 34d38b0 to 5134520 Compare March 17, 2023 13:30
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 17, 2023
@pacoxu
Copy link
Member Author

pacoxu commented Mar 17, 2023

@vinaykul
I drop your commit from #116702 to fix #116694 according to #116696 (comment).

Your PR fixes the pod object update race. And this PR will fix race in kubemark TestHollowNode/kubelet

@pacoxu
Copy link
Member Author

pacoxu commented Mar 17, 2023

/cc @SergeyKanzhelev @mrunalp

Copy link
Contributor

@xmcqueen xmcqueen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a86a0a13431bc6582c2950e0f203248967c06cb5

@liggitt
Copy link
Member

liggitt commented Mar 17, 2023

/approve

since this was trimmed down to more obviously safe race fixes

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, pacoxu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2023
@liggitt
Copy link
Member

liggitt commented Mar 17, 2023

/milestone v1.27

@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Mar 17, 2023
@k8s-ci-robot k8s-ci-robot merged commit aa0fea6 into kubernetes:master Mar 17, 2023
@bobbypage
Copy link
Member

Delayed lgtm, thank you for the fix.

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race in kubemark TestHollowNode/kubelet

8 participants