image load: make removeExistingImage best-effort to fix silent failure when image is in use#22993
Conversation
…e when image is in use
minikube image load <image:tag> silently fails to update an image when a
container using that image is already running inside minikube.
removeExistingImage runs the runtime's remove command (e.g., docker rmi)
before loading the new image. When a running container references the image,
docker rmi fails with "unable to remove repository reference (must force)".
The error-handling treats this as fatal because the message does not match
the hard-coded patterns ("no such image", "unable to remove the image"),
which prevents LoadImage from ever being called. Additionally, DoLoadImages
swallows the error, so the user sees no indication of failure.
More fundamentally, removal of the old image cannot be a precondition for
loading the new one. Container runtimes (docker load, ctr images import)
can overwrite an existing tag without prior removal.
Fix: make removeExistingImage best-effort by logging a warning on failure
and always proceeding to LoadImage. Remove the brittle error-message
pattern matching entirely.
|
|
|
Welcome @monkey-mas! |
|
Hi @monkey-mas. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: monkey-mas The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Can one of the admins verify this patch? |
Summary
minikube image load <image:tag>silently fails to update an image when a container using that image is already running inside minikube. This is a common scenario during development: rebuild a Docker image with the same tag (e.g.,myapp:0.1.0-SNAPSHOT), then reload it into minikube while the previous deployment is still running.The command exits successfully with no error message, but the image inside minikube is not updated.
Related to #11276 (originally fixed in #11366).
Root cause
transferAndLoadImagecallsremoveExistingImagebefore loading the new image (ref: v1.38.1 source).removeExistingImageruns the runtime's remove command (e.g.,docker rmi <image:tag>) inside minikube, which fails when a running container references the image:The error-handling in
removeExistingImageuses ad-hoc string matching against a hard-coded list of "acceptable" error messages (ref: v1.38.1 source):"no such image""unable to remove the image"(added in fix: fix empty tarball when generating image save #19312)Any error that does not match is treated as fatal. This includes Docker's
"unable to remove repository reference"error shown above, as well as any new error messages that container runtimes may introduce in the future. When the error is treated as fatal,LoadImageis never called. Additionally,DoLoadImagesswallows the error (returns nil), so the user sees no indication of failure.Note: with older versions of
crictl rmi(cri-tools ≤ v1.31), the error message for any removal failure was a catch-all"unable to remove the image(s)"(ref: v1.29.0 source), which happened to match the hard-coded pattern. Since cri-tools v1.32.0 (ref: cri-tools PR 1639), image removal was parallelized and the catch-all was replaced with per-image error messages, meaning the pattern match no longer works for newer CRI runtimes either. This further demonstrates that ad-hoc string matching is not a sustainable approach.More fundamentally, removal of the old image cannot be a precondition for loading the new one. A running container holds a reference to the image it was started from, and that reference is not released until the container is stopped - not even after a new image is loaded with the same tag. This means a "remove-then-load" sequence will always fail when a container is running with the target image. Since container runtimes (
docker load,ctr images import) can overwrite an existing tag without prior removal, the removal step should be best-effort rather than a gate.Fix
Make
removeExistingImagebest-effort: log a warning on failure and always proceed toLoadImage. Remove the brittle error-message pattern matching entirely.It looks like this approach was originally suggested by @medyagh during the review of #11366 (comment):
The initial commit of that PR (
32a91d6) implemented this as a warning, but a follow-up commit within the same PR (16dbac1) changed it to a fatal error to address performance concerns aroundListImages. This fix reverts to the non-fatal approach without reintroducing theListImagescall -RemoveImageis called directly and its failure is simply logged. See the full commit history of #11366 for context.Any old image that cannot be removed becomes dangling and can be cleaned up separately (e.g.,
docker image prune). This is already the case today: the"unable to remove the image"pattern added in #19312 allows loading to proceed even when removal fails, which also results in dangling images.Note:
DoLoadImagescurrently swallows errors fromLoadCachedImages(returnsnilregardless of failure). This is a separate issue and is not addressed in this PR.Changes
removeExistingImage: a) change return type fromerrorto nothing, b) log warning on failure instead of returning a fatal error, and c) remove ad-hoc error-message pattern matching.transferAndLoadImage: callremoveExistingImagewithout error check.removeExistingImage: verifyRemoveImageis called (or skipped for tar files) and that no error prevents loading.Manual integration test
Setup: minikube running with Docker runtime.
Dockerfiles:
Steps: