Skip to content

Conversation

andreaskaris
Copy link
Contributor

@andreaskaris andreaskaris commented Jul 16, 2025

cherry-pick of #9349 and of #9405

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

The same as for the change to main, I ran a smoke test for sanity checking:

I ran my smoke test on a VM with 14 CPUs where I first installed the entire stack with kubeadm.

Dependencies:

# rpm -qa | grep kubelet
kubelet-1.33.1-150500.1.1.x86_64
# rpm -qa | grep kubeadm
kubeadm-1.33.1-150500.1.1.x86_64
# rpm -qa | grep crun
crun-1.21-1.el9.x86_64
# rpm -qa | grep conmon
conmon-2.1.13-1.el9.x86_64

/etc/crio/crio.conf.d/99-runtimes.conf

[crio.runtime]
infra_ctr_cpuset = "0-3"




# The CRI-O will check the allowed_annotations under the runtime handler and apply high-performance hooks when one of
# high-performance annotations presents under it.
# We should provide the runtime_path because we need to inform that we want to re-use runc binary and we
# do not have high-performance binary under the $PATH that will point to it.
[crio.runtime.runtimes.high-performance]
inherit_default_runtime = true
allowed_annotations = ["cpu-load-balancing.crio.io", "cpu-quota.crio.io", "irq-load-balancing.crio.io", "cpu-c-states.crio.io", "cpu-freq-governor.crio.io"]
# tail -n 5 /var/lib/kubelet/config.yaml
cpuManagerPolicy: static
cpuManagerPolicyOptions:
  full-pcpus-only: "true"
cpuManagerReconcilePeriod: 5s
reservedSystemCPUs: 0-3

I then stopped the crio service, started the compiled crio in foreground make binaries && bin/crio, and ran this smoke test:

smoke.sh

#!/bin/bash

set -x

affinity_file="/proc/irq/default_smp_affinity"
expected_reset_affinity="3fff"
expected_mask="3e0f"

echo $expected_reset_affinity > $affinity_file
cat $affinity_file

for i in {0..20}; do
	set +x
	echo "========"
	echo "Run ${i}"
	echo "========"
	set -x
	kubectl apply -f pod.yaml
	kubectl wait --for=condition=Ready pod/qos-demo --timeout=180s
	mask=$(cat ${affinity_file} | tr -d '\n')
        echo "Got mask: $mask, expected mask: $expected_mask"
        if [ "${mask}" != "${expected_mask}" ]; then
            exit 1
        fi
        kubectl delete pod qos-demo
        kubectl wait --for=delete pod/qos-demo --timeout=180s
	mask=$(cat ${affinity_file} | tr -d '\n')
	echo "After reset --- Got mask: $mask, expected mask: $expected_reset_affinity"
	if [ "${mask}" != "${expected_reset_affinity}" ]; then
	    exit 1
	fi
done

pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: qos-demo
  annotations:
    irq-load-balancing.crio.io: "disable"
spec:
  hostNetwork: true
  runtimeClassName: performance-performance
  containers:
  - name: qos-demo-ctr-1
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"
  - name: qos-demo-ctr-2
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"
  - name: qos-demo-ctr-3
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"
  - name: qos-demo-ctr-4
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"
  - name: qos-demo-ctr-5
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"

Does this PR introduce a user-facing change?

None

@andreaskaris andreaskaris requested a review from mrunalp as a code owner July 16, 2025 19:00
@openshift-ci openshift-ci bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Jul 16, 2025
@openshift-ci openshift-ci bot requested review from hasan4791 and klihub July 16, 2025 19:00
@openshift-ci-robot openshift-ci-robot added the jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. label Jul 16, 2025
@openshift-ci openshift-ci bot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Jul 16, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jul 16, 2025
@openshift-ci-robot
Copy link

@andreaskaris: This pull request references Jira Issue OCPBUGS-59415, which is invalid:

  • expected dependent Jira Issue OCPBUGS-59321 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is ASSIGNED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Cherry-pick of:

(cherry picked from commit 3dce7d8)

Conflicts:
internal/runtimehandlerhooks/high_performance_hooks_linux.go
internal/runtimehandlerhooks/runtime_handler_hooks_linux.go
server/container_create.go
server/container_start.go
server/sandbox_run_linux.go
Conflict due to missing 3c7337f in container_create.go / container_create_linux.go Other conflicts flagged by git but code was exactly the same.

Reported-at: https://issues.redhat.com/browse/OCPBUGS-59321 (cherry picked from commit 7cddfd4)

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

The same as for the change to main, I ran a smoke test for sanity checking:

I ran my smoke test on a VM with 14 CPUs where I first installed the entire stack with kubeadm.

Dependencies:

# rpm -qa | grep kubelet
kubelet-1.33.1-150500.1.1.x86_64
# rpm -qa | grep kubeadm
kubeadm-1.33.1-150500.1.1.x86_64
# rpm -qa | grep crun
crun-1.21-1.el9.x86_64
# rpm -qa | grep conmon
conmon-2.1.13-1.el9.x86_64

/etc/crio/crio.conf.d/99-runtimes.conf

[crio.runtime]
infra_ctr_cpuset = "0-3"




# The CRI-O will check the allowed_annotations under the runtime handler and apply high-performance hooks when one of
# high-performance annotations presents under it.
# We should provide the runtime_path because we need to inform that we want to re-use runc binary and we
# do not have high-performance binary under the $PATH that will point to it.
[crio.runtime.runtimes.high-performance]
inherit_default_runtime = true
allowed_annotations = ["cpu-load-balancing.crio.io", "cpu-quota.crio.io", "irq-load-balancing.crio.io", "cpu-c-states.crio.io", "cpu-freq-governor.crio.io"]
# tail -n 5 /var/lib/kubelet/config.yaml
cpuManagerPolicy: static
cpuManagerPolicyOptions:
 full-pcpus-only: "true"
cpuManagerReconcilePeriod: 5s
reservedSystemCPUs: 0-3

I then stopped the crio service, started the compiled crio in foreground make binaries && bin/crio, and ran this smoke test:

smoke.sh

#!/bin/bash

set -x

affinity_file="/proc/irq/default_smp_affinity"
expected_reset_affinity="3fff"
expected_mask="3e0f"

echo $expected_reset_affinity > $affinity_file
cat $affinity_file

for i in {0..20}; do
  set +x
  echo "========"
  echo "Run ${i}"
  echo "========"
  set -x
  kubectl apply -f pod.yaml
  kubectl wait --for=condition=Ready pod/qos-demo --timeout=180s
  mask=$(cat ${affinity_file} | tr -d '\n')
       echo "Got mask: $mask, expected mask: $expected_mask"
       if [ "${mask}" != "${expected_mask}" ]; then
           exit 1
       fi
       kubectl delete pod qos-demo
       kubectl wait --for=delete pod/qos-demo --timeout=180s
  mask=$(cat ${affinity_file} | tr -d '\n')
  echo "After reset --- Got mask: $mask, expected mask: $expected_reset_affinity"
  if [ "${mask}" != "${expected_reset_affinity}" ]; then
      exit 1
  fi
done

pod.yaml

apiVersion: v1
kind: Pod
metadata:
 name: qos-demo
 annotations:
   irq-load-balancing.crio.io: "disable"
spec:
 hostNetwork: true
 runtimeClassName: performance-performance
 containers:
 - name: qos-demo-ctr-1
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-2
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-3
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-4
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-5
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"

Does this PR introduce a user-facing change?


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Jul 16, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: andreaskaris
Once this PR has been reviewed and has the lgtm label, please assign giuseppe for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 16, 2025
Copy link
Contributor

openshift-ci bot commented Jul 16, 2025

Hi @andreaskaris. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 16, 2025
@andreaskaris andreaskaris changed the title OCPBUGS-59415: HighPerformanceHooks: Fix IRQ SMP affinity race conditions [1.32][4.19] OCPBUGS-59415: HighPerformanceHooks: Fix IRQ SMP affinity race conditions Jul 16, 2025
@andreaskaris
Copy link
Contributor Author

The failing "prettier" verification is complaining about install.md ; can we ignore "prettier"?

Copy link

codecov bot commented Jul 16, 2025

Codecov Report

❌ Patch coverage is 70.00000% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.68%. Comparing base (2de10fd) to head (53ea4f4).

Additional details and impacted files
@@               Coverage Diff                @@
##           release-1.32    #9350      +/-   ##
================================================
+ Coverage         46.99%   47.68%   +0.69%     
================================================
  Files               155      155              
  Lines             22298    22360      +62     
================================================
+ Hits              10479    10663     +184     
+ Misses            10744    10601     -143     
- Partials           1075     1096      +21     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andreaskaris
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link

@andreaskaris: This pull request references Jira Issue OCPBUGS-59415, which is invalid:

  • expected dependent Jira Issue OCPBUGS-59321 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bitoku
Copy link
Contributor

bitoku commented Jul 28, 2025

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 28, 2025
@andreaskaris
Copy link
Contributor Author

/hold I found another issue with race conditions and other pieces must be fixed upstream before merging this

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 28, 2025
Copy link

A friendly reminder that this PR had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 28, 2025
openshift-merge-bot bot and others added 2 commits October 1, 2025 12:39
Cherry-pick of:
- Merge pull request cri-o#9228 from andreaskaris/issue9227
The original 6 commits merged but were not squashed together,
therefore doing this here on the downstream cherry-pick.

(cherry picked from commit 3dce7d8)

Conflicts:
   i) internal/runtimehandlerhooks/high_performance_hooks_linux.go
      internal/runtimehandlerhooks/runtime_handler_hooks_linux.go
      server/container_create.go
      server/container_start.go
      server/sandbox_run_linux.go
  ii) internal/runtimehandlerhooks/high_performance_hooks_test.go
i) Conflict due to missing 3c7337f
in container_create.go / container_create_linux.go
ii) Missing import of hostport
Other conflicts flagged by git but code was exactly the same.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
Reported-at: https://issues.redhat.com/browse/OCPBUGS-59321
(cherry picked from commit 7cddfd4)
A prior patch addressing race conditions in this code section was
incomplete as it used 2 different locks for irqbalance and irq SMP
affinity files. This still allowed for a race condition wrt irqbalance
configuration. This fix addresses this issue by using a single lock and
by making the entire change atomic.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
(cherry picked from commit 1283afc)

Conflicts:
	internal/runtimehandlerhooks/high_performance_hooks_linux.go
        Not applying cleanly to setIRQLoadBalancing, accepting all of
        new change
Add unit tests for irq smp affinity settings. In order to do so, add
service and command manager structures that can be mocked.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
(cherry picked from commit 06c8437)

Conflicts:
	internal/runtimehandlerhooks/high_performance_hooks_linux.go
        applying to RestoreIrqBalanceConfig, accepting incoming changes
Having IRQ balancing logic inside the PreStop hook can cause issues with
ordering (possibility to hit sequence container add, replacement
container add, container stop). Moving the same logic into PostStop will
guarantee correct ordering.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
(cherry picked from commit 03ec73d)

 Conflicts:
	internal/runtimehandlerhooks/high_performance_hooks_linux.go
        applying to PostStop, accepted all incoming changes (cleanly)
Signed-off-by: Andreas Karis <ak.karis@gmail.com>
(cherry picked from commit 78c966c)
@openshift-ci-robot
Copy link

@andreaskaris: This pull request references Jira Issue OCPBUGS-59415, which is invalid:

  • expected dependent Jira Issue OCPBUGS-59321 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is POST instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

cherry-pick of #9349 and of #9405

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

The same as for the change to main, I ran a smoke test for sanity checking:

I ran my smoke test on a VM with 14 CPUs where I first installed the entire stack with kubeadm.

Dependencies:

# rpm -qa | grep kubelet
kubelet-1.33.1-150500.1.1.x86_64
# rpm -qa | grep kubeadm
kubeadm-1.33.1-150500.1.1.x86_64
# rpm -qa | grep crun
crun-1.21-1.el9.x86_64
# rpm -qa | grep conmon
conmon-2.1.13-1.el9.x86_64

/etc/crio/crio.conf.d/99-runtimes.conf

[crio.runtime]
infra_ctr_cpuset = "0-3"




# The CRI-O will check the allowed_annotations under the runtime handler and apply high-performance hooks when one of
# high-performance annotations presents under it.
# We should provide the runtime_path because we need to inform that we want to re-use runc binary and we
# do not have high-performance binary under the $PATH that will point to it.
[crio.runtime.runtimes.high-performance]
inherit_default_runtime = true
allowed_annotations = ["cpu-load-balancing.crio.io", "cpu-quota.crio.io", "irq-load-balancing.crio.io", "cpu-c-states.crio.io", "cpu-freq-governor.crio.io"]
# tail -n 5 /var/lib/kubelet/config.yaml
cpuManagerPolicy: static
cpuManagerPolicyOptions:
 full-pcpus-only: "true"
cpuManagerReconcilePeriod: 5s
reservedSystemCPUs: 0-3

I then stopped the crio service, started the compiled crio in foreground make binaries && bin/crio, and ran this smoke test:

smoke.sh

#!/bin/bash

set -x

affinity_file="/proc/irq/default_smp_affinity"
expected_reset_affinity="3fff"
expected_mask="3e0f"

echo $expected_reset_affinity > $affinity_file
cat $affinity_file

for i in {0..20}; do
  set +x
  echo "========"
  echo "Run ${i}"
  echo "========"
  set -x
  kubectl apply -f pod.yaml
  kubectl wait --for=condition=Ready pod/qos-demo --timeout=180s
  mask=$(cat ${affinity_file} | tr -d '\n')
       echo "Got mask: $mask, expected mask: $expected_mask"
       if [ "${mask}" != "${expected_mask}" ]; then
           exit 1
       fi
       kubectl delete pod qos-demo
       kubectl wait --for=delete pod/qos-demo --timeout=180s
  mask=$(cat ${affinity_file} | tr -d '\n')
  echo "After reset --- Got mask: $mask, expected mask: $expected_reset_affinity"
  if [ "${mask}" != "${expected_reset_affinity}" ]; then
      exit 1
  fi
done

pod.yaml

apiVersion: v1
kind: Pod
metadata:
 name: qos-demo
 annotations:
   irq-load-balancing.crio.io: "disable"
spec:
 hostNetwork: true
 runtimeClassName: performance-performance
 containers:
 - name: qos-demo-ctr-1
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-2
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-3
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-4
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-5
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"

Does this PR introduce a user-facing change?

None

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants