-
Notifications
You must be signed in to change notification settings - Fork 5.1k
vfkit: Graceful shutdown on Stop() #20504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We use `HardStop` which seems to do a forced shutdown instead of
graceful shutdown, and to make things worse, always fails with:
I0309 15:00:42.452986 13723 stop.go:66] stop err: Post "http://_/vm/state": EOF
Which leads to unneeded retry and pointless backup attempt timing out
after 135 seconds because vkfit was stopped.
With this change we do a graceful shutdown, and the time to stop the
cluster decreased from 135 seconds to 3 seconds (45 times faster).
Example stop log:
I0309 15:34:33.104429 14440 main.go:141] libmachine: Stopping "minikube"...
I0309 15:34:33.105225 14440 main.go:141] libmachine: get state: {State:VirtualMachineStateRunning}
I0309 15:34:33.105799 14440 main.go:141] libmachine: set state: {State:Stop}
I0309 15:34:33.106099 14440 main.go:141] libmachine: get state: {State:VirtualMachineStateRunning}
I0309 15:34:36.109380 14440 main.go:141] libmachine: Machine "minikube" was stopped.
|
Hi @nirs. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Unrelated, but there's a call to d.SetVFKitState("Stop"); in Remove which could be changed to HardStop
|
|
||
| func (d *Driver) Stop() error { | ||
| if err := d.SetVFKitState("HardStop"); err != nil { | ||
| if err := d.SetVFKitState("Stop"); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth to try HardStop if Stop fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible, but usually if Stop fails vfkit is likely not running, so HadrStop will also fail, and it typically fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should do a HardStop if this fails, we have similar code
we have a package called "retry" I think after a two times of retry with normal stop it should do a hard stop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is an example of using it https://github.com/medyagh/minikube/blob/d510d46696d6e3588921d51d8daf41297e7fdc8d/pkg/minikube/machine/stop.go#L72
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code does not do retries in the driver, assuming that the caller doing the retries. Lets fix the issues first in the driver, and then we can evaluate if stop should retry in the driver, and apply it to more drivers.
Yes, this is fixed in #20506 |
|
|
||
| func (d *Driver) Stop() error { | ||
| if err := d.SetVFKitState("HardStop"); err != nil { | ||
| if err := d.SetVFKitState("Stop"); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should do a HardStop if this fails, we have similar code
we have a package called "retry" I think after a two times of retry with normal stop it should do a hard stop
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: medyagh, nirs The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
We use
HardStopwhich seems to do a forced shutdown instead of graceful shutdown, and to make things worse, always fails with:Which leads to unneeded retry and pointless backup attempt timing out after 135 seconds because vkfit was stopped.
With this change we do a graceful shutdown, and the time to stop the cluster decreased from 135 seconds to 3 seconds (45 times faster).
Example stop log:
Fixes #20503