Skip to content

Conversation

@nirs
Copy link
Contributor

@nirs nirs commented Mar 9, 2025

We use HardStop which seems to do a forced shutdown instead of graceful shutdown, and to make things worse, always fails with:

I0309 15:00:42.452986   13723 stop.go:66] stop err: Post "http://_/vm/state": EOF

Which leads to unneeded retry and pointless backup attempt timing out after 135 seconds because vkfit was stopped.

With this change we do a graceful shutdown, and the time to stop the cluster decreased from 135 seconds to 3 seconds (45 times faster).

Example stop log:

I0309 15:34:33.104429   14440 main.go:141] libmachine: Stopping "minikube"...
I0309 15:34:33.105225   14440 main.go:141] libmachine: get state: {State:VirtualMachineStateRunning}
I0309 15:34:33.105799   14440 main.go:141] libmachine: set state: {State:Stop}
I0309 15:34:33.106099   14440 main.go:141] libmachine: get state: {State:VirtualMachineStateRunning}
I0309 15:34:36.109380   14440 main.go:141] libmachine: Machine "minikube" was stopped.

Fixes #20503

We use `HardStop` which seems to do a forced shutdown instead of
graceful shutdown, and to make things worse, always fails with:

    I0309 15:00:42.452986   13723 stop.go:66] stop err: Post "http://_/vm/state": EOF

Which leads to unneeded retry and pointless backup attempt timing out
after 135 seconds because vkfit was stopped.

With this change we do a graceful shutdown, and the time to stop the
cluster decreased from 135 seconds to 3 seconds (45 times faster).

Example stop log:

    I0309 15:34:33.104429   14440 main.go:141] libmachine: Stopping "minikube"...
    I0309 15:34:33.105225   14440 main.go:141] libmachine: get state: {State:VirtualMachineStateRunning}
    I0309 15:34:33.105799   14440 main.go:141] libmachine: set state: {State:Stop}
    I0309 15:34:33.106099   14440 main.go:141] libmachine: get state: {State:VirtualMachineStateRunning}
    I0309 15:34:36.109380   14440 main.go:141] libmachine: Machine "minikube" was stopped.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 9, 2025
@k8s-ci-robot k8s-ci-robot requested review from medyagh and prezha March 9, 2025 13:43
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 9, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @nirs. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 9, 2025
@minikube-bot
Copy link
Collaborator

Can one of the admins verify this patch?

Copy link

@cfergeau cfergeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Unrelated, but there's a call to d.SetVFKitState("Stop"); in Remove which could be changed to HardStop


func (d *Driver) Stop() error {
if err := d.SetVFKitState("HardStop"); err != nil {
if err := d.SetVFKitState("Stop"); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth to try HardStop if Stop fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible, but usually if Stop fails vfkit is likely not running, so HadrStop will also fail, and it typically fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should do a HardStop if this fails, we have similar code
we have a package called "retry" I think after a two times of retry with normal stop it should do a hard stop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code does not do retries in the driver, assuming that the caller doing the retries. Lets fix the issues first in the driver, and then we can evaluate if stop should retry in the driver, and apply it to more drivers.

@nirs
Copy link
Contributor Author

nirs commented Mar 10, 2025

Looks good to me. Unrelated, but there's a call to d.SetVFKitState("Stop"); in Remove which could be changed to HardStop

Yes, this is fixed in #20506


func (d *Driver) Stop() error {
if err := d.SetVFKitState("HardStop"); err != nil {
if err := d.SetVFKitState("Stop"); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should do a HardStop if this fails, we have similar code
we have a package called "retry" I think after a two times of retry with normal stop it should do a hard stop

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2025
@medyagh
Copy link
Member

medyagh commented Mar 10, 2025

/lgtm

@medyagh medyagh merged commit 3957c12 into kubernetes:master Mar 10, 2025
5 of 6 checks passed
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 10, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh, nirs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nirs nirs deleted the vfkit-stop-fix branch May 23, 2025 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

minikube stop very slow with vfkit driver

5 participants