Skip to content

fix(destination): skip servers with invalid selectors#14195

Merged
adleong merged 1 commit into
mainfrom
alex/selector-i-hardly-know
Jul 10, 2025
Merged

fix(destination): skip servers with invalid selectors#14195
adleong merged 1 commit into
mainfrom
alex/selector-i-hardly-know

Conversation

@adleong

@adleong adleong commented Jun 30, 2025

Copy link
Copy Markdown
Member

Fixes #14176

When a Server resource has an invalid podSelector, it triggers an error in the destination controller's SetToServerProtocol and aborts processing any further Servers and aborts setting addresses on destination controller responses, leading to failures across the mesh.

One invalid Server should not bring down the mesh. Instead, when encountering an invalid Server, we log an error message and then continue processing any other Servers and continue to set addresses as usual.

As a more comprehensive solution, we should also update the Server resource validation so that invalid Servers cannot be created and add a Status subresource to Server to reflect the current validity of the Server. These followups are tracked in #14194

Signed-off-by: Alex Leong <alex@buoyant.io>
@adleong adleong requested a review from a team as a code owner June 30, 2025 22:22
err = SetToServerProtocol(pp.k8sAPI, &address, pp.log)
if err != nil {
pp.log.Errorf("failed to set address OpaqueProtocol: %s", err)
continue

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What motivates removing this continue? Isn't it still unacceptable to proceed if an error is returned?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, the cases are:

	if address.Pod == nil {
		return fmt.Errorf("endpoint not backed by Pod: %s:%d", address.IP, address.Port)
	}

and

	servers, err := k8sAPI.Srv().Lister().Servers("").List(labels.Everything())
	if err != nil {
		return fmt.Errorf("failed to list Servers: %w", err)
	}

What are the implications of adding the address in these cases?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SetToServerProtocol function sets the opaque protocol field on the address based on any Server resources which select this pod.

The first case should be impossible since we specifically create a pod backed address above.

For the second case, what is the desirable degradation if we fail to fetch Server resources? Should we skip all addresses because we cannot determine the correct opaque transport value? Or proceed with the default opaque transport value?

My feeling is that we should proceed, especially since we are likely using forceOpaqueTransport where the opaque transport value is ignored anyway.

@adleong adleong merged commit f41fcac into main Jul 10, 2025
40 checks passed
@adleong adleong deleted the alex/selector-i-hardly-know branch July 10, 2025 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

invalid label in Server cause errors in Destination container

2 participants