-
Notifications
You must be signed in to change notification settings - Fork 244
Description
In a 5k node CI job we have a test failure that seems to be related to loadbalancer controller in CCM failing to handle an unexpected GCP api error (Thanks @danwinship for digging into this here: https://kubernetes.slack.com/archives/CN0K3TE2C/p1723560482393589?thread_ts=1723493683.959229&cid=CN0K3TE2C)
->
https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-correctness/1823042209046335488/artifacts/master-and-node-logs.link.txt
->
https://gcsweb.k8s.io/gcs/k8s-infra-scalability-tests-logs/ci-kubernetes-e2e-gce-scale-correctness/1823042209046335488/
->
https://storage.googleapis.com/k8s-infra-scalability-tests-logs/ci-kubernetes-e2e-gce-scale-correctness/1823042209046335488/gce-scale-cluster-master/cloud-controller-manager.log
Per @danwinship :
The CCM log shows a 502 error from a cloud API at 17:42:42.656992, and then shows
E0812 18:42:37.300236 11 gce_loadbalancer.go:206] Failed to EnsureLoadBalancer(gce-scale-cluster, loadbalancers-9672, affinity-lb-esipp-transition, a1b2bc4622d1041aeabe57d2c40cd9bd, us-east1), err: failed to create forwarding rule for load balancer (a1b2bc4622d1041aeabe57d2c40cd9bd(loadbalancers-9672/affinity-lb-esipp-transition)): context deadline exceeded
an hour later (not clear if that's triggered by the e2e test doing cleanup or a separate identical timeout)
So this looks like cloud-provider-gcp failing to handle an unexpected google cloud api error
/sig scalability
/sig cloud-provider