Skip to content

Conversation

andreaskaris
Copy link
Contributor

@andreaskaris andreaskaris commented Jul 16, 2025

Cherry-pick of:

(cherry picked from commit 3dce7d8)

Reported-at: https://issues.redhat.com/browse/OCPBUGS-59321

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

The same as for the change to main, I ran a smoke test for sanity checking:

I ran my smoke test on a VM with 14 CPUs where I first installed the entire stack with kubeadm.

Dependencies:

# rpm -qa | grep kubelet
kubelet-1.33.1-150500.1.1.x86_64
# rpm -qa | grep kubeadm
kubeadm-1.33.1-150500.1.1.x86_64
# rpm -qa | grep crun
crun-1.21-1.el9.x86_64
# rpm -qa | grep conmon
conmon-2.1.13-1.el9.x86_64

/etc/crio/crio.conf.d/99-runtimes.conf

[crio.runtime]
infra_ctr_cpuset = "0-3"




# The CRI-O will check the allowed_annotations under the runtime handler and apply high-performance hooks when one of
# high-performance annotations presents under it.
# We should provide the runtime_path because we need to inform that we want to re-use runc binary and we
# do not have high-performance binary under the $PATH that will point to it.
[crio.runtime.runtimes.high-performance]
inherit_default_runtime = true
allowed_annotations = ["cpu-load-balancing.crio.io", "cpu-quota.crio.io", "irq-load-balancing.crio.io", "cpu-c-states.crio.io", "cpu-freq-governor.crio.io"]
# tail -n 5 /var/lib/kubelet/config.yaml
cpuManagerPolicy: static
cpuManagerPolicyOptions:
  full-pcpus-only: "true"
cpuManagerReconcilePeriod: 5s
reservedSystemCPUs: 0-3

I then stopped the crio service, started the compiled crio in foreground make binaries && bin/crio, and ran this smoke test:

smoke.sh

#!/bin/bash

set -x

affinity_file="/proc/irq/default_smp_affinity"
expected_reset_affinity="3fff"
expected_mask="3e0f"

echo $expected_reset_affinity > $affinity_file
cat $affinity_file

for i in {0..20}; do
	set +x
	echo "========"
	echo "Run ${i}"
	echo "========"
	set -x
	kubectl apply -f pod.yaml
	kubectl wait --for=condition=Ready pod/qos-demo --timeout=180s
	mask=$(cat ${affinity_file} | tr -d '\n')
        echo "Got mask: $mask, expected mask: $expected_mask"
        if [ "${mask}" != "${expected_mask}" ]; then
            exit 1
        fi
        kubectl delete pod qos-demo
        kubectl wait --for=delete pod/qos-demo --timeout=180s
	mask=$(cat ${affinity_file} | tr -d '\n')
	echo "After reset --- Got mask: $mask, expected mask: $expected_reset_affinity"
	if [ "${mask}" != "${expected_reset_affinity}" ]; then
	    exit 1
	fi
done

pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: qos-demo
  annotations:
    irq-load-balancing.crio.io: "disable"
spec:
  hostNetwork: true
  runtimeClassName: performance-performance
  containers:
  - name: qos-demo-ctr-1
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"
  - name: qos-demo-ctr-2
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"
  - name: qos-demo-ctr-3
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"
  - name: qos-demo-ctr-4
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"
  - name: qos-demo-ctr-5
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"

Does this PR introduce a user-facing change?

None

Cherry-pick of:
- Merge pull request cri-o#9228 from andreaskaris/issue9227
The original 6 commits merged but were not squashed together,
therefore doing this here on the downstream cherry-pick.

(cherry picked from commit 3dce7d8)

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
Reported-at: https://issues.redhat.com/browse/OCPBUGS-59321
@andreaskaris andreaskaris requested a review from mrunalp as a code owner July 16, 2025 16:53
@openshift-ci-robot openshift-ci-robot added the jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. label Jul 16, 2025
@openshift-ci openshift-ci bot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Jul 16, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 16, 2025
@openshift-ci openshift-ci bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Jul 16, 2025
@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jul 16, 2025
@openshift-ci-robot
Copy link

@andreaskaris: This pull request references Jira Issue OCPBUGS-59321, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected Jira Issue OCPBUGS-59321 to depend on a bug in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Cherry-pick of:

(cherry picked from commit 3dce7d8)

Reported-at: https://issues.redhat.com/browse/OCPBUGS-59321

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 16, 2025
@openshift-ci openshift-ci bot requested a review from klihub July 16, 2025 16:54
Copy link
Contributor

openshift-ci bot commented Jul 16, 2025

Hi @andreaskaris. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot requested a review from littlejawa July 16, 2025 16:54
Copy link

codecov bot commented Jul 16, 2025

Codecov Report

Attention: Patch coverage is 78.26087% with 15 lines in your changes missing coverage. Please review.

Project coverage is 47.98%. Comparing base (5b19bdb) to head (7cddfd4).
Report is 5 commits behind head on release-1.33.

Additional details and impacted files
@@               Coverage Diff                @@
##           release-1.33    #9349      +/-   ##
================================================
+ Coverage         47.40%   47.98%   +0.57%     
================================================
  Files               163      163              
  Lines             24103    24114      +11     
================================================
+ Hits              11427    11571     +144     
+ Misses            11556    11402     -154     
- Partials           1120     1141      +21     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@openshift-ci openshift-ci bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 16, 2025
@andreaskaris andreaskaris marked this pull request as draft July 16, 2025 17:16
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 16, 2025
@andreaskaris
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Jul 16, 2025
@openshift-ci-robot
Copy link

@andreaskaris: This pull request references Jira Issue OCPBUGS-59321, which is valid. The bug has been moved to the POST state.

5 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note type set to "Release Note Not Required"
  • dependent bug Jira Issue OCPBUGS-59403 is in the state Closed (Done), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • bug has dependents

Requesting review from QA contact:
/cc @lyman9966

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jul 16, 2025
Copy link
Contributor

openshift-ci bot commented Jul 16, 2025

@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: lyman9966.

Note that only cri-o members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

@andreaskaris: This pull request references Jira Issue OCPBUGS-59321, which is valid. The bug has been moved to the POST state.

5 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note type set to "Release Note Not Required"
  • dependent bug Jira Issue OCPBUGS-59403 is in the state Closed (Done), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • bug has dependents

Requesting review from QA contact:
/cc @lyman9966

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@andreaskaris andreaskaris marked this pull request as ready for review July 16, 2025 17:26
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 16, 2025
@andreaskaris
Copy link
Contributor Author

/cherry-pick release-1.32

@openshift-cherrypick-robot

@andreaskaris: only cri-o org members may request cherry picks. If you are already part of the org, make sure to change your membership to public. Otherwise you can still do the cherry-pick manually.

In response to this:

/cherry-pick release-1.32

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@andreaskaris andreaskaris changed the title OCPBUGS-59321: HighPerformanceHooks: Fix IRQ SMP affinity race conditions [1.33][4.20] OCPBUGS-59321: HighPerformanceHooks: Fix IRQ SMP affinity race conditions Jul 16, 2025
@sohankunkerkar
Copy link
Member

sohankunkerkar commented Jul 21, 2025

/ok-to-test

@sohankunkerkar sohankunkerkar removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 21, 2025
@sohankunkerkar sohankunkerkar added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jul 21, 2025
Copy link
Member

@sohankunkerkar sohankunkerkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2025
Copy link
Contributor

openshift-ci bot commented Jul 21, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreaskaris, sohankunkerkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit ac52d40 into cri-o:release-1.33 Jul 21, 2025
55 checks passed
@openshift-ci-robot
Copy link

@andreaskaris: Jira Issue OCPBUGS-59321: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-59321 has been moved to the MODIFIED state.

In response to this:

Cherry-pick of:

(cherry picked from commit 3dce7d8)

Reported-at: https://issues.redhat.com/browse/OCPBUGS-59321

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

The same as for the change to main, I ran a smoke test for sanity checking:

I ran my smoke test on a VM with 14 CPUs where I first installed the entire stack with kubeadm.

Dependencies:

# rpm -qa | grep kubelet
kubelet-1.33.1-150500.1.1.x86_64
# rpm -qa | grep kubeadm
kubeadm-1.33.1-150500.1.1.x86_64
# rpm -qa | grep crun
crun-1.21-1.el9.x86_64
# rpm -qa | grep conmon
conmon-2.1.13-1.el9.x86_64

/etc/crio/crio.conf.d/99-runtimes.conf

[crio.runtime]
infra_ctr_cpuset = "0-3"




# The CRI-O will check the allowed_annotations under the runtime handler and apply high-performance hooks when one of
# high-performance annotations presents under it.
# We should provide the runtime_path because we need to inform that we want to re-use runc binary and we
# do not have high-performance binary under the $PATH that will point to it.
[crio.runtime.runtimes.high-performance]
inherit_default_runtime = true
allowed_annotations = ["cpu-load-balancing.crio.io", "cpu-quota.crio.io", "irq-load-balancing.crio.io", "cpu-c-states.crio.io", "cpu-freq-governor.crio.io"]
# tail -n 5 /var/lib/kubelet/config.yaml
cpuManagerPolicy: static
cpuManagerPolicyOptions:
 full-pcpus-only: "true"
cpuManagerReconcilePeriod: 5s
reservedSystemCPUs: 0-3

I then stopped the crio service, started the compiled crio in foreground make binaries && bin/crio, and ran this smoke test:

smoke.sh

#!/bin/bash

set -x

affinity_file="/proc/irq/default_smp_affinity"
expected_reset_affinity="3fff"
expected_mask="3e0f"

echo $expected_reset_affinity > $affinity_file
cat $affinity_file

for i in {0..20}; do
  set +x
  echo "========"
  echo "Run ${i}"
  echo "========"
  set -x
  kubectl apply -f pod.yaml
  kubectl wait --for=condition=Ready pod/qos-demo --timeout=180s
  mask=$(cat ${affinity_file} | tr -d '\n')
       echo "Got mask: $mask, expected mask: $expected_mask"
       if [ "${mask}" != "${expected_mask}" ]; then
           exit 1
       fi
       kubectl delete pod qos-demo
       kubectl wait --for=delete pod/qos-demo --timeout=180s
  mask=$(cat ${affinity_file} | tr -d '\n')
  echo "After reset --- Got mask: $mask, expected mask: $expected_reset_affinity"
  if [ "${mask}" != "${expected_reset_affinity}" ]; then
      exit 1
  fi
done

pod.yaml

apiVersion: v1
kind: Pod
metadata:
 name: qos-demo
 annotations:
   irq-load-balancing.crio.io: "disable"
spec:
 hostNetwork: true
 runtimeClassName: performance-performance
 containers:
 - name: qos-demo-ctr-1
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-2
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-3
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-4
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-5
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"

Does this PR introduce a user-facing change?

None

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants