-
Notifications
You must be signed in to change notification settings - Fork 670
Updated default timeout seconds for probes #2265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated default timeout seconds for probes #2265
Conversation
|
Hi @HarshAgarwal11, would you mind fixing the CI error and running unit tests locally? You can read this doc for more details https://github.com/ray-project/kuberay/blob/master/ray-operator/DEVELOPMENT.md#running-the-tests |
kevin85421
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix CI errors
@kevin85421 CI errors have been fixed |
| // Ray FT default readiness probe values | ||
| DefaultReadinessProbeInitialDelaySeconds = 10 | ||
| DefaultReadinessProbeTimeoutSeconds = 1 | ||
| DefaultReadinessProbeTimeoutSeconds = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into issues with the probe timeout in v1.1 as well.
I am thinking this probe timeout should actually be 4 or 5 seconds for the Head pod. This is because the probe for head pod runs both the agent heath check and GCS health check:
wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep success
&& wget -T 2 -q -O- http://localhost:8265/api/gcs_healthz | grep success
Which is collectively up to 4 seconds. Thoughts @kevin85421 @HarshAgarwal11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should be 4 to 5 seconds. Because of the OR statement, timeout might get add up. And with 2 sec I was still getting some timeouts, not as frequent as earlier. But after changing it to 5 seconds, I didn't see any timeouts, there were some failures though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #2353
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a separete issue to track exec probe issues: #2355
Why are these changes needed?
Have updated Probes Timeout Seconds for Ray Clusters to 2 seconds, as the value used in the probes command for wget timeout is 2 seconds. Also added the wget timeout to be picked from readiness default timeout seconds.
Initially Default Timeout Seconds for both the probes were 1 second, which used to result in the failure of probes, even before the wget command gets finished.
Related issue number
#2264
Checks