Skip to content

CCM LoadBalancer flake in 5,000 node job #753

@BenTheElder

Description

@BenTheElder

In a 5k node CI job we have a test failure that seems to be related to loadbalancer controller in CCM failing to handle an unexpected GCP api error (Thanks @danwinship for digging into this here: https://kubernetes.slack.com/archives/CN0K3TE2C/p1723560482393589?thread_ts=1723493683.959229&cid=CN0K3TE2C)

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-correctness/1823042209046335488

->
https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-correctness/1823042209046335488/artifacts/master-and-node-logs.link.txt
->
https://gcsweb.k8s.io/gcs/k8s-infra-scalability-tests-logs/ci-kubernetes-e2e-gce-scale-correctness/1823042209046335488/
->
https://storage.googleapis.com/k8s-infra-scalability-tests-logs/ci-kubernetes-e2e-gce-scale-correctness/1823042209046335488/gce-scale-cluster-master/cloud-controller-manager.log

Per @danwinship :

The CCM log shows a 502 error from a cloud API at 17:42:42.656992, and then shows
E0812 18:42:37.300236 11 gce_loadbalancer.go:206] Failed to EnsureLoadBalancer(gce-scale-cluster, loadbalancers-9672, affinity-lb-esipp-transition, a1b2bc4622d1041aeabe57d2c40cd9bd, us-east1), err: failed to create forwarding rule for load balancer (a1b2bc4622d1041aeabe57d2c40cd9bd(loadbalancers-9672/affinity-lb-esipp-transition)): context deadline exceeded
an hour later (not clear if that's triggered by the e2e test doing cleanup or a separate identical timeout)
So this looks like cloud-provider-gcp failing to handle an unexpected google cloud api error

/sig scalability
/sig cloud-provider

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/cloud-providerCategorizes an issue or PR as relevant to SIG Cloud Provider.sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions