Skip to content

Intermediate "Ingress Reconciliation failed" event received during reconciliation #9727

@rhuss

Description

@rhuss

What version of Knative?

0.18

Expected Behavior

No intermediate Ready == False should be received during a service update reconciliation if the reconciliation eventually finishes up with Ready == True

Actual Behavior

Since 0.18.0 the client CI has a frequent flake (maybe 80% I would say) when updating a service after a series on operations on this service:

       ┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
        🦆 kn service update svc3a -a alpha=direwolf -a brave- --namespace kne2etests23
        ┃ Updating Service 'svc3a' in namespace 'kne2etests23':
        ┃ 
        ┃   0.059s The Configuration is still working to reflect the latest desired specification.
        ┃   0.480s Ingress reconciliation failed
        ┃ 
        🔥 Error: ReconcileIngressFailed: Ingress reconciliation failed
        🔥 Run 'kn --help' for usage
        🔥 

If we look at the service right after this error appeared it looks like:

      Conditions:
            Last Transition Time:        2020-10-07T14:50:48Z
            Status:                      Unknown
            Type:                        ConfigurationsReady
            Last Transition Time:        2020-10-07T14:50:48Z
            Status:                      Unknown
            Type:                        Ready
            Last Transition Time:        2020-10-07T14:50:48Z
            Status:                      True
            Type:                        RoutesReady

so it already came over the false status for Ready (but kn considered this intermediate state as an error).
You can see the status of the cluster at that time here

This issue with intermittent false ready states has been already discussed in #6784 but without a solution.

The current safeguard that we have implemented in the client (i.e. using an "error window" in which it has waited for another state change in case of an error) seems not to work here. We are investigating this in parallel on the client-side (on knative/client#1052

The question here though is: Why is this "ingress reconcile failed" event thrown at all and what has changed in serving that this happens now that often ?

Steps to Reproduce the Problem

See the steps in https://prow.knative.dev/view/gs/knative-prow/logs/ci-knative-client-auto-release/1313848987291226115#1:build-log.txt%3A3758 that lead to this error

Metadata

Metadata

Assignees

Labels

area/APIAPI objects and controllerskind/bugCategorizes issue or PR as related to a bug.triage/acceptedIssues which should be fixed (post-triage)

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions