Skip to content

Memory leak with systemd watchdog #9360

@rkojedzinszky

Description

@rkojedzinszky

What happened?

Upgraded from 1.31 to later versions, and observed a constant memory leak. If I disable Watchdog support in systemd unit file, no memory leak is observable.

What did you expect to happen?

Expected no memory leaks.

How can we reproduce it (as minimally and precisely as possible)?

Start crio with systemd watchdog, then use the following code, which simulates watchdog functionality, meanwhile observe crio rss usage:

package main

import (
	"context"
	"errors"
	"fmt"
	"log"
	"time"

	cri "k8s.io/cri-client/pkg"
)

func checkCRIHealth() error {
	// Validate that a CRI connection is possible using the socket path.
	rrs, err := cri.NewRemoteRuntimeService("unix:///var/run/crio/crio.sock", 5*time.Second, nil, nil)
	if err != nil {
		log.Fatal(err)
	}

	response, err := rrs.Status(context.TODO(), false)
	if err != nil {
		return fmt.Errorf("get runtime status: %w", err)
	}

	// Verify that everything is okay
	if response.GetStatus() == nil {
		return errors.New("runtime status is nil")
	}

	if response.GetStatus().GetConditions() == nil {
		return errors.New("runtime conditions are nil")
	}

	for _, c := range response.GetStatus().GetConditions() {
		if c.GetType() == "NetworkReady" {
			continue
		}

		if !c.GetStatus() {
			return fmt.Errorf(
				"runtime status %q is invalid: %s (reason: %s)",
				c.GetType(), c.GetMessage(), c.GetReason(),
			)
		}
	}

	return nil
}

func main() {
	t := time.NewTicker(10 * time.Millisecond)
	defer t.Stop()
	for {
		start := time.Now()
		if err := checkCRIHealth(); err != nil {
			fmt.Printf("CRI health check failed: %v\n", err)
		}
		fmt.Printf("CRI health check took %v\n", time.Since(start))
		<-t.C
	}
}

Anything else we need to know?

No response

CRI-O and Kubernetes version

# crio --version
crio version 1.33.1
   GitCommit:      16e977f7294c8fd3a4268319627afc017f17e2b3
   GitCommitDate:  2025-06-17T13:06:26Z
   GitTreeState:   clean
   BuildDate:      2025-06-17T13:33:02Z
   GoVersion:      go1.24.3
   Compiler:       gc
   Platform:       linux/amd64
   Linkmode:       dynamic
   BuildTags:
     containers_image_ostree_stub
     apparmor
     exclude_graphdriver_btrfs
     btrfs_noversion
     seccomp
     selinux
   LDFlags:          -s -w -X github.com/cri-o/cri-o/internal/version.buildDate=2025-06-17T13:33:02Z 
   SeccompEnabled:   true
   AppArmorEnabled:  true
# kubectl version --output=json
{
  "clientVersion": {
    "major": "1",
    "minor": "33",
    "gitVersion": "v1.33.1",
    "gitCommit": "8adc0f041b8e7ad1d30e29cc59c6ae7a15e19828",
    "gitTreeState": "clean",
    "buildDate": "2025-05-15T08:27:33Z",
    "goVersion": "go1.24.2",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "kustomizeVersion": "v5.6.0",
  "serverVersion": {
    "major": "1",
    "minor": "33",
    "emulationMajor": "1",
    "emulationMinor": "33",
    "minCompatibilityMajor": "1",
    "minCompatibilityMinor": "32",
    "gitVersion": "v1.33.1",
    "gitCommit": "8adc0f041b8e7ad1d30e29cc59c6ae7a15e19828",
    "gitTreeState": "clean",
    "buildDate": "2025-05-15T08:19:08Z",
    "goVersion": "go1.24.2",
    "compiler": "gc",
    "platform": "linux/amd64"
  }
}

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

Additional environment details (AWS, VirtualBox, physical, etc.)

kvm

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions