Skip to content

image load: make removeExistingImage best-effort to fix silent failure when image is in use#22993

Open
monkey-mas wants to merge 1 commit into
kubernetes:masterfrom
monkey-mas:fix/image-load-best-effort-removal
Open

image load: make removeExistingImage best-effort to fix silent failure when image is in use#22993
monkey-mas wants to merge 1 commit into
kubernetes:masterfrom
monkey-mas:fix/image-load-best-effort-removal

Conversation

@monkey-mas
Copy link
Copy Markdown

Summary

minikube image load <image:tag> silently fails to update an image when a container using that image is already running inside minikube. This is a common scenario during development: rebuild a Docker image with the same tag (e.g.,
myapp:0.1.0-SNAPSHOT), then reload it into minikube while the previous deployment is still running.

The command exits successfully with no error message, but the image inside minikube is not updated.

Related to #11276 (originally fixed in #11366).

Root cause

transferAndLoadImage calls removeExistingImage before loading the new image (ref: v1.38.1 source). removeExistingImage runs the runtime's remove command (e.g., docker rmi <image:tag>) inside minikube, which fails when a running container references the image:

conflict: unable to remove repository reference "myapp:0.1.0" (must force)
- container abc123 is using its referenced image def456

The error-handling in removeExistingImage uses ad-hoc string matching against a hard-coded list of "acceptable" error messages (ref: v1.38.1 source):

Any error that does not match is treated as fatal. This includes Docker's "unable to remove repository reference" error shown above, as well as any new error messages that container runtimes may introduce in the future. When the error is treated as fatal, LoadImage is never called. Additionally, DoLoadImages swallows the error (returns nil), so the user sees no indication of failure.

Note: with older versions of crictl rmi (cri-tools ≤ v1.31), the error message for any removal failure was a catch-all "unable to remove the image(s)" (ref: v1.29.0 source), which happened to match the hard-coded pattern. Since cri-tools v1.32.0 (ref: cri-tools PR 1639), image removal was parallelized and the catch-all was replaced with per-image error messages, meaning the pattern match no longer works for newer CRI runtimes either. This further demonstrates that ad-hoc string matching is not a sustainable approach.

More fundamentally, removal of the old image cannot be a precondition for loading the new one. A running container holds a reference to the image it was started from, and that reference is not released until the container is stopped - not even after a new image is loaded with the same tag. This means a "remove-then-load" sequence will always fail when a container is running with the target image. Since container runtimes (docker load, ctr images import) can overwrite an existing tag without prior removal, the removal step should be best-effort rather than a gate.

Fix

Make removeExistingImage best-effort: log a warning on failure and always proceed to LoadImage. Remove the brittle error-message pattern matching entirely.

It looks like this approach was originally suggested by @medyagh during the review of #11366 (comment):

or alternatively we could do

klog.Warniningf("failed to remove image (which might be okay if it doesn't exists")

The initial commit of that PR (32a91d6) implemented this as a warning, but a follow-up commit within the same PR (16dbac1) changed it to a fatal error to address performance concerns around ListImages. This fix reverts to the non-fatal approach without reintroducing the ListImages call - RemoveImage is called directly and its failure is simply logged. See the full commit history of #11366 for context.

Any old image that cannot be removed becomes dangling and can be cleaned up separately (e.g., docker image prune). This is already the case today: the "unable to remove the image" pattern added in #19312 allows loading to proceed even when removal fails, which also results in dangling images.

Note: DoLoadImages currently swallows errors from LoadCachedImages (returns nil regardless of failure). This is a separate issue and is not addressed in this PR.

Changes

  • removeExistingImage: a) change return type from error to nothing, b) log warning on failure instead of returning a fatal error, and c) remove ad-hoc error-message pattern matching.
  • transferAndLoadImage: call removeExistingImage without error check.
  • Add unit tests for removeExistingImage: verify RemoveImage is called (or skipped for tar files) and that no error prevents loading.

Manual integration test

Setup: minikube running with Docker runtime.

Dockerfiles:

# Dockerfile.v1
FROM busybox:latest
RUN echo "version1" > /version.txt
# Dockerfile.v2
FROM busybox:latest
RUN echo "version2" > /version.txt
RUN echo "sample content" > /sample.txt

Steps:

# 1. Build and load v1
docker buildx build -f Dockerfile.v1 -t repro-test:0.1.0 .
minikube image load repro-test:0.1.0

# 2. Verify v1 is loaded
minikube ssh -- docker run --rm repro-test:0.1.0 cat /version.txt
# Expected: version1

# 3. Start a container using the image (simulates a running deployment)
minikube ssh -- docker run -d --name repro-running repro-test:0.1.0 sleep 3600

# 4. Build v2 with the same tag
docker buildx build -f Dockerfile.v2 -t repro-test:0.1.0 .

# 5. Load v2 with --overwrite
minikube image load --overwrite repro-test:0.1.0

# 6. Verify v2 is loaded
minikube ssh -- docker run --rm repro-test:0.1.0 cat /version.txt
# Before fix: "version1" (silent failure)
# After fix:  "version2"

minikube ssh -- docker run --rm repro-test:0.1.0 cat /sample.txt
# Before fix: "cat: can't open '/sample.txt': No such file or directory"
# After fix:  "sample content"

# 7. Cleanup
minikube ssh -- docker rm -f repro-running
minikube ssh -- docker rmi repro-test:0.1.0
docker rmi repro-test:0.1.0

…e when image is in use

minikube image load <image:tag> silently fails to update an image when a
container using that image is already running inside minikube.

removeExistingImage runs the runtime's remove command (e.g., docker rmi)
before loading the new image. When a running container references the image,
docker rmi fails with "unable to remove repository reference (must force)".
The error-handling treats this as fatal because the message does not match
the hard-coded patterns ("no such image", "unable to remove the image"),
which prevents LoadImage from ever being called. Additionally, DoLoadImages
swallows the error, so the user sees no indication of failure.

More fundamentally, removal of the old image cannot be a precondition for
loading the new one. Container runtimes (docker load, ctr images import)
can overwrite an existing tag without prior removal.

Fix: make removeExistingImage best-effort by logging a warning on failure
and always proceeding to LoadImage. Remove the brittle error-message
pattern matching entirely.
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented May 17, 2026

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: monkey-mas / name: Masaru Nomura (84c9c9e)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label May 17, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @monkey-mas!

It looks like this is your first PR to kubernetes/minikube 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/minikube has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 17, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @monkey-mas. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: monkey-mas
Once this PR has been reviewed and has the lgtm label, please assign nirs for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@minikube-bot
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants