fix(destination): skip servers with invalid selectors#14195
Conversation
Signed-off-by: Alex Leong <alex@buoyant.io>
| err = SetToServerProtocol(pp.k8sAPI, &address, pp.log) | ||
| if err != nil { | ||
| pp.log.Errorf("failed to set address OpaqueProtocol: %s", err) | ||
| continue |
There was a problem hiding this comment.
What motivates removing this continue? Isn't it still unacceptable to proceed if an error is returned?
There was a problem hiding this comment.
Specifically, the cases are:
if address.Pod == nil {
return fmt.Errorf("endpoint not backed by Pod: %s:%d", address.IP, address.Port)
}and
servers, err := k8sAPI.Srv().Lister().Servers("").List(labels.Everything())
if err != nil {
return fmt.Errorf("failed to list Servers: %w", err)
}
What are the implications of adding the address in these cases?
There was a problem hiding this comment.
The SetToServerProtocol function sets the opaque protocol field on the address based on any Server resources which select this pod.
The first case should be impossible since we specifically create a pod backed address above.
For the second case, what is the desirable degradation if we fail to fetch Server resources? Should we skip all addresses because we cannot determine the correct opaque transport value? Or proceed with the default opaque transport value?
My feeling is that we should proceed, especially since we are likely using forceOpaqueTransport where the opaque transport value is ignored anyway.
Fixes #14176
When a Server resource has an invalid podSelector, it triggers an error in the destination controller's
SetToServerProtocoland aborts processing any further Servers and aborts setting addresses on destination controller responses, leading to failures across the mesh.One invalid Server should not bring down the mesh. Instead, when encountering an invalid Server, we log an error message and then continue processing any other Servers and continue to set addresses as usual.
As a more comprehensive solution, we should also update the Server resource validation so that invalid Servers cannot be created and add a Status subresource to Server to reflect the current validity of the Server. These followups are tracked in #14194