-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What version of Knative?
0.18
Expected Behavior
No intermediate Ready == False should be received during a service update reconciliation if the reconciliation eventually finishes up with Ready == True
Actual Behavior
Since 0.18.0 the client CI has a frequent flake (maybe 80% I would say) when updating a service after a series on operations on this service:
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🦆 kn service update svc3a -a alpha=direwolf -a brave- --namespace kne2etests23
┃ Updating Service 'svc3a' in namespace 'kne2etests23':
┃
┃ 0.059s The Configuration is still working to reflect the latest desired specification.
┃ 0.480s Ingress reconciliation failed
┃
🔥 Error: ReconcileIngressFailed: Ingress reconciliation failed
🔥 Run 'kn --help' for usage
🔥
If we look at the service right after this error appeared it looks like:
Conditions:
Last Transition Time: 2020-10-07T14:50:48Z
Status: Unknown
Type: ConfigurationsReady
Last Transition Time: 2020-10-07T14:50:48Z
Status: Unknown
Type: Ready
Last Transition Time: 2020-10-07T14:50:48Z
Status: True
Type: RoutesReady
so it already came over the false status for Ready (but kn considered this intermediate state as an error).
You can see the status of the cluster at that time here
This issue with intermittent false ready states has been already discussed in #6784 but without a solution.
The current safeguard that we have implemented in the client (i.e. using an "error window" in which it has waited for another state change in case of an error) seems not to work here. We are investigating this in parallel on the client-side (on knative/client#1052
The question here though is: Why is this "ingress reconcile failed" event thrown at all and what has changed in serving that this happens now that often ?
Steps to Reproduce the Problem
See the steps in https://prow.knative.dev/view/gs/knative-prow/logs/ci-knative-client-auto-release/1313848987291226115#1:build-log.txt%3A3758 that lead to this error