Skip to content

Conversation

andreaskaris
Copy link
Contributor

@andreaskaris andreaskaris commented Jul 16, 2025

cherry-pick of #9350

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

The same as for the change to main, I ran a smoke test for sanity checking:

I ran my smoke test on a VM with 14 CPUs where I first installed the entire stack with kubeadm.

Dependencies:

# rpm -qa | grep kubelet
kubelet-1.33.1-150500.1.1.x86_64
# rpm -qa | grep kubeadm
kubeadm-1.33.1-150500.1.1.x86_64
# rpm -qa | grep crun
crun-1.21-1.el9.x86_64
# rpm -qa | grep conmon
conmon-2.1.13-1.el9.x86_64

/etc/crio/crio.conf.d/99-runtimes.conf

[crio.runtime]
infra_ctr_cpuset = "0-3"




# The CRI-O will check the allowed_annotations under the runtime handler and apply high-performance hooks when one of
# high-performance annotations presents under it.
# We should provide the runtime_path because we need to inform that we want to re-use runc binary and we
# do not have high-performance binary under the $PATH that will point to it.
[crio.runtime.runtimes.high-performance]
inherit_default_runtime = true
allowed_annotations = ["cpu-load-balancing.crio.io", "cpu-quota.crio.io", "irq-load-balancing.crio.io", "cpu-c-states.crio.io", "cpu-freq-governor.crio.io"]
# tail -n 5 /var/lib/kubelet/config.yaml
cpuManagerPolicy: static
cpuManagerPolicyOptions:
  full-pcpus-only: "true"
cpuManagerReconcilePeriod: 5s
reservedSystemCPUs: 0-3

I then stopped the crio service, started the compiled crio in foreground make binaries && bin/crio, and ran this smoke test:

smoke.sh

#!/bin/bash

set -x

affinity_file="/proc/irq/default_smp_affinity"
expected_reset_affinity="3fff"
expected_mask="3e0f"

echo $expected_reset_affinity > $affinity_file
cat $affinity_file

for i in {0..20}; do
	set +x
	echo "========"
	echo "Run ${i}"
	echo "========"
	set -x
	kubectl apply -f pod.yaml
	kubectl wait --for=condition=Ready pod/qos-demo --timeout=180s
	mask=$(cat ${affinity_file} | tr -d '\n')
        echo "Got mask: $mask, expected mask: $expected_mask"
        if [ "${mask}" != "${expected_mask}" ]; then
            exit 1
        fi
        kubectl delete pod qos-demo
        kubectl wait --for=delete pod/qos-demo --timeout=180s
	mask=$(cat ${affinity_file} | tr -d '\n')
	echo "After reset --- Got mask: $mask, expected mask: $expected_reset_affinity"
	if [ "${mask}" != "${expected_reset_affinity}" ]; then
	    exit 1
	fi
done

pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: qos-demo
  annotations:
    irq-load-balancing.crio.io: "disable"
spec:
  hostNetwork: true
  runtimeClassName: performance-performance
  containers:
  - name: qos-demo-ctr-1
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"
  - name: qos-demo-ctr-2
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"
  - name: qos-demo-ctr-3
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"
  - name: qos-demo-ctr-4
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"
  - name: qos-demo-ctr-5
    image: quay.io/akaris/nice-test
    command:
    - "/bin/sleep"
    - "infinity"
    resources:
      limits:
        memory: "100Mi"
        cpu: "1"
      requests:
        memory: "100Mi"
        cpu: "1"

Does this PR introduce a user-facing change?

None

@andreaskaris andreaskaris requested a review from mrunalp as a code owner July 16, 2025 19:36
@openshift-ci openshift-ci bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jul 16, 2025
Copy link
Contributor

openshift-ci bot commented Jul 16, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: andreaskaris
Once this PR has been reviewed and has the lgtm label, please assign haircommander for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested review from hasan4791 and QiWang19 July 16, 2025 19:36
Copy link
Contributor

openshift-ci bot commented Jul 16, 2025

Hi @andreaskaris. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 16, 2025
@andreaskaris andreaskaris changed the title [1.31][4.18] HighPerformanceHooks: Fix IRQ SMP affinity race conditions [1.31][4.18] OCPBUGS-59416: HighPerformanceHooks: Fix IRQ SMP affinity race conditions Jul 16, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jul 16, 2025
@openshift-ci-robot
Copy link

@andreaskaris: This pull request references Jira Issue OCPBUGS-59416, which is invalid:

  • expected dependent Jira Issue OCPBUGS-59415 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is New instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Cherry-pick of:

(cherry picked from commit 3dce7d8)

Conflicts:
i) internal/runtimehandlerhooks/high_performance_hooks_linux.go
internal/runtimehandlerhooks/runtime_handler_hooks_linux.go
server/container_create.go
server/container_start.go
server/sandbox_run_linux.go
ii) internal/runtimehandlerhooks/high_performance_hooks_test.go
i) Conflict due to missing 3c7337f in container_create.go / container_create_linux.go ii) Missing import of hostport
Other conflicts flagged by git but code was exactly the same.

Reported-at: https://issues.redhat.com/browse/OCPBUGS-59321 (cherry picked from commit 7cddfd4) (cherry picked from commit 5c39a2e)

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

The same as for the change to main, I ran a smoke test for sanity checking:

I ran my smoke test on a VM with 14 CPUs where I first installed the entire stack with kubeadm.

Dependencies:

# rpm -qa | grep kubelet
kubelet-1.33.1-150500.1.1.x86_64
# rpm -qa | grep kubeadm
kubeadm-1.33.1-150500.1.1.x86_64
# rpm -qa | grep crun
crun-1.21-1.el9.x86_64
# rpm -qa | grep conmon
conmon-2.1.13-1.el9.x86_64

/etc/crio/crio.conf.d/99-runtimes.conf

[crio.runtime]
infra_ctr_cpuset = "0-3"




# The CRI-O will check the allowed_annotations under the runtime handler and apply high-performance hooks when one of
# high-performance annotations presents under it.
# We should provide the runtime_path because we need to inform that we want to re-use runc binary and we
# do not have high-performance binary under the $PATH that will point to it.
[crio.runtime.runtimes.high-performance]
inherit_default_runtime = true
allowed_annotations = ["cpu-load-balancing.crio.io", "cpu-quota.crio.io", "irq-load-balancing.crio.io", "cpu-c-states.crio.io", "cpu-freq-governor.crio.io"]
# tail -n 5 /var/lib/kubelet/config.yaml
cpuManagerPolicy: static
cpuManagerPolicyOptions:
 full-pcpus-only: "true"
cpuManagerReconcilePeriod: 5s
reservedSystemCPUs: 0-3

I then stopped the crio service, started the compiled crio in foreground make binaries && bin/crio, and ran this smoke test:

smoke.sh

#!/bin/bash

set -x

affinity_file="/proc/irq/default_smp_affinity"
expected_reset_affinity="3fff"
expected_mask="3e0f"

echo $expected_reset_affinity > $affinity_file
cat $affinity_file

for i in {0..20}; do
  set +x
  echo "========"
  echo "Run ${i}"
  echo "========"
  set -x
  kubectl apply -f pod.yaml
  kubectl wait --for=condition=Ready pod/qos-demo --timeout=180s
  mask=$(cat ${affinity_file} | tr -d '\n')
       echo "Got mask: $mask, expected mask: $expected_mask"
       if [ "${mask}" != "${expected_mask}" ]; then
           exit 1
       fi
       kubectl delete pod qos-demo
       kubectl wait --for=delete pod/qos-demo --timeout=180s
  mask=$(cat ${affinity_file} | tr -d '\n')
  echo "After reset --- Got mask: $mask, expected mask: $expected_reset_affinity"
  if [ "${mask}" != "${expected_reset_affinity}" ]; then
      exit 1
  fi
done

pod.yaml

apiVersion: v1
kind: Pod
metadata:
 name: qos-demo
 annotations:
   irq-load-balancing.crio.io: "disable"
spec:
 hostNetwork: true
 runtimeClassName: performance-performance
 containers:
 - name: qos-demo-ctr-1
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-2
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-3
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-4
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-5
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"

Does this PR introduce a user-facing change?

None

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@andreaskaris andreaskaris marked this pull request as draft July 16, 2025 19:37
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 16, 2025
@andreaskaris
Copy link
Contributor Author

andreaskaris commented Jul 16, 2025

To do: Have to fix conflict in unit test due to missing:
4409a15#diff-bc25e15e13072e3a8ff7fc6a28ab46835c3051e5dca571380e110210c1edabae


3m 12s
Run golangci/golangci-lint-action@aaa42aa0628b4ae2578232a66b541047968fac86
prepare environment
run golangci-lint
  Running [/home/runner/golangci-lint-1.60.3-linux-amd64/golangci-lint run] in [/home/runner/work/cri-o/cri-o] ...
  internal/runtimehandlerhooks/default_cpu_load_balance_hooks_linux.go:1: : # github.com/cri-o/cri-o/internal/runtimehandlerhooks [github.com/cri-o/cri-o/internal/runtimehandlerhooks.test]
  Error: internal/runtimehandlerhooks/high_performance_hooks_test.go:803:20: undefined: sandbox.NewBuilder (typecheck)
  package runtimehandlerhooks
  
  Error: issues found
  Ran golangci-lint in 191263ms

@andreaskaris andreaskaris force-pushed the OCPBUGS-59416 branch 3 times, most recently from 94ea23f to 6c297b9 Compare July 18, 2025 14:11
@andreaskaris andreaskaris marked this pull request as ready for review July 21, 2025 08:51
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 21, 2025
@openshift-ci openshift-ci bot requested a review from sohankunkerkar July 21, 2025 08:51
@openshift-ci-robot
Copy link

@andreaskaris: This pull request references Jira Issue OCPBUGS-59416, which is invalid:

  • expected dependent Jira Issue OCPBUGS-59415 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is ASSIGNED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

cherry-pick of #9350

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

The same as for the change to main, I ran a smoke test for sanity checking:

I ran my smoke test on a VM with 14 CPUs where I first installed the entire stack with kubeadm.

Dependencies:

# rpm -qa | grep kubelet
kubelet-1.33.1-150500.1.1.x86_64
# rpm -qa | grep kubeadm
kubeadm-1.33.1-150500.1.1.x86_64
# rpm -qa | grep crun
crun-1.21-1.el9.x86_64
# rpm -qa | grep conmon
conmon-2.1.13-1.el9.x86_64

/etc/crio/crio.conf.d/99-runtimes.conf

[crio.runtime]
infra_ctr_cpuset = "0-3"




# The CRI-O will check the allowed_annotations under the runtime handler and apply high-performance hooks when one of
# high-performance annotations presents under it.
# We should provide the runtime_path because we need to inform that we want to re-use runc binary and we
# do not have high-performance binary under the $PATH that will point to it.
[crio.runtime.runtimes.high-performance]
inherit_default_runtime = true
allowed_annotations = ["cpu-load-balancing.crio.io", "cpu-quota.crio.io", "irq-load-balancing.crio.io", "cpu-c-states.crio.io", "cpu-freq-governor.crio.io"]
# tail -n 5 /var/lib/kubelet/config.yaml
cpuManagerPolicy: static
cpuManagerPolicyOptions:
 full-pcpus-only: "true"
cpuManagerReconcilePeriod: 5s
reservedSystemCPUs: 0-3

I then stopped the crio service, started the compiled crio in foreground make binaries && bin/crio, and ran this smoke test:

smoke.sh

#!/bin/bash

set -x

affinity_file="/proc/irq/default_smp_affinity"
expected_reset_affinity="3fff"
expected_mask="3e0f"

echo $expected_reset_affinity > $affinity_file
cat $affinity_file

for i in {0..20}; do
  set +x
  echo "========"
  echo "Run ${i}"
  echo "========"
  set -x
  kubectl apply -f pod.yaml
  kubectl wait --for=condition=Ready pod/qos-demo --timeout=180s
  mask=$(cat ${affinity_file} | tr -d '\n')
       echo "Got mask: $mask, expected mask: $expected_mask"
       if [ "${mask}" != "${expected_mask}" ]; then
           exit 1
       fi
       kubectl delete pod qos-demo
       kubectl wait --for=delete pod/qos-demo --timeout=180s
  mask=$(cat ${affinity_file} | tr -d '\n')
  echo "After reset --- Got mask: $mask, expected mask: $expected_reset_affinity"
  if [ "${mask}" != "${expected_reset_affinity}" ]; then
      exit 1
  fi
done

pod.yaml

apiVersion: v1
kind: Pod
metadata:
 name: qos-demo
 annotations:
   irq-load-balancing.crio.io: "disable"
spec:
 hostNetwork: true
 runtimeClassName: performance-performance
 containers:
 - name: qos-demo-ctr-1
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-2
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-3
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-4
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"
 - name: qos-demo-ctr-5
   image: quay.io/akaris/nice-test
   command:
   - "/bin/sleep"
   - "infinity"
   resources:
     limits:
       memory: "100Mi"
       cpu: "1"
     requests:
       memory: "100Mi"
       cpu: "1"

Does this PR introduce a user-facing change?

None

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bitoku
Copy link
Contributor

bitoku commented Jul 28, 2025

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 28, 2025
@andreaskaris
Copy link
Contributor Author

/hold I found another issue with race conditions and other pieces must be fixed upstream before merging this

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 28, 2025
@andreaskaris andreaskaris force-pushed the OCPBUGS-59416 branch 2 times, most recently from 6697640 to f8bd50a Compare August 7, 2025 17:06
Copy link

github-actions bot commented Sep 7, 2025

A friendly reminder that this PR had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 7, 2025
openshift-merge-bot bot and others added 5 commits October 1, 2025 13:19
Cherry-pick of:
- Merge pull request cri-o#9228 from andreaskaris/issue9227
The original 6 commits merged but were not squashed together,
therefore doing this here on the downstream cherry-pick.

(cherry picked from commit 3dce7d8)
(cherry picked from commit 7cddfd4)
(cherry picked from commit 5c39a2e)

Conflicts:
  internal/runtimehandlerhooks/high_performance_hooks_test.go
    Needed to convert sandbox.NewBuilder to sandbox.New due
    to missing 409a15.
  server/sandbox_stop_freebsd.go
    Needed to make manual change due to downstream call to
    GetRuntimeHandlerHooks in FreeBSD implementation.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
Reported-at: https://issues.redhat.com/browse/OCPBUGS-59416
A prior patch addressing race conditions in this code section was
incomplete as it used 2 different locks for irqbalance and irq SMP
affinity files. This still allowed for a race condition wrt irqbalance
configuration. This fix addresses this issue by using a single lock and
by making the entire change atomic.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
(cherry picked from commit 1283afc)

Conflicts:
	internal/runtimehandlerhooks/high_performance_hooks_linux.go
        Not applying cleanly to setIRQLoadBalancing, accepting all of
        new change
(cherry picked from commit ff418a1)
Add unit tests for irq smp affinity settings. In order to do so, add
service and command manager structures that can be mocked.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
(cherry picked from commit 06c8437)

Conflicts:
	internal/runtimehandlerhooks/high_performance_hooks_linux.go
        applying to RestoreIrqBalanceConfig, accepting incoming changes
(cherry picked from commit 20da163)
Having IRQ balancing logic inside the PreStop hook can cause issues with
ordering (possibility to hit sequence container add, replacement
container add, container stop). Moving the same logic into PostStop will
guarantee correct ordering.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
(cherry picked from commit 03ec73d)

 Conflicts:
	internal/runtimehandlerhooks/high_performance_hooks_linux.go
        applying to PostStop, accepted all incoming changes (cleanly)

(cherry picked from commit 211098d)
Signed-off-by: Andreas Karis <ak.karis@gmail.com>
(cherry picked from commit 78c966c)
(cherry picked from commit 53ea4f4)
Copy link
Contributor

openshift-ci bot commented Oct 1, 2025

@andreaskaris: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn f5dceec link true /test e2e-gcp-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants