Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: v1.66.0 regression in xds.BootstrapContentsForTesting #7592

Open
howardjohn opened this issue Sep 5, 2024 · 13 comments
Open

xds: v1.66.0 regression in xds.BootstrapContentsForTesting #7592

howardjohn opened this issue Sep 5, 2024 · 13 comments
Assignees
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Bug

Comments

@howardjohn
Copy link
Contributor

howardjohn commented Sep 5, 2024

What version of gRPC are you using?

v1.66.0

What version of Go are you using (go version)?

1.23.0

What operating system (Linux, Windows, …) and version?

Linux

What did you do?

We have tests in Istio setting a unique bootstrap for multiple gRPC servers: https://github.com/istio/istio/blob/2d853376723eb78b6b16b195b05cdc3b542efeaa/pkg/test/echo/server/endpoint/grpc.go#L81.

After upgrading to 1.66, we found these are ignored, and the first bootstrap is used for each server.

What did you expect to see?

Same behavior as 1.65; each server uses its own bootstrap

What did you see instead?

@arvindbr8 arvindbr8 self-assigned this Sep 5, 2024
@easwars
Copy link
Contributor

easwars commented Sep 5, 2024

Does your test spin up multiple servers as part of a single test?

We recently introduced some changes wherein we added support for multiple xds clients per binary (instead of the global singleton that we used to have earlier). The way this works is that we keep a map of xds clients indexed by a name. The value of the name is the grpc channel's dial target when the xds client is created from a grpc client. The value of the name is a fixed string for xds clients created from grpc servers. So, grpc channels with different target URIs will get different xds clients, but all grpc servers will get the same xds client. These changes were made as part of https://github.com/grpc/proposal/blob/master/A71-xds-fallback.md.

As part of the above changes, we also consolidated some code used to create xds clients for testing. Looks like this is what broke your tests.

@howardjohn
Copy link
Contributor Author

Yes, we spin up multiple servers in one test

@arvindbr8 arvindbr8 assigned easwars and unassigned arvindbr8 Sep 5, 2024
@easwars
Copy link
Contributor

easwars commented Sep 5, 2024

Let me discuss with other language implementors to see how they support this post A71.

@easwars
Copy link
Contributor

easwars commented Sep 5, 2024

@howardjohn : While I think about other ways to support this, why would you want to spin up multiple servers within the same test that all require different bootstrap configurations? FWIW: this will not be possible in production, since all servers will end up with the same bootstrap configuration.

@howardjohn
Copy link
Contributor Author

We want to test client --> {set of servers}. https://github.com/istio/istio/blob/86057475a6e34dcdd2de9dfe8ccd8f8b483c7edc/pilot/pkg/networking/grpcgen/grpcecho_test.go#L166

Also its not the same test necessarily, its the same binary. Multiple go tests in the same package are in one binary. So we have a test for a server with mTLS, and then later another without for instance. 1 server per test but same issue. But we do have a test with multiple servers in a single test as noted above

@easwars
Copy link
Contributor

easwars commented Sep 5, 2024

But we do have a test with multiple servers in a single test as noted above

Why would these servers require different boostrap configurations though?

@howardjohn
Copy link
Contributor Author

Istio identifies distinct workloads by the node Id which is a part of the bootstrap. Each one gets different config based on the node id

@easwars
Copy link
Contributor

easwars commented Sep 12, 2024

This is going to take a little bit of time to fix, and the fix that we have in mind is big enough to not be appropriate for a patch release.

Here is the proposal:

  • Create a new type xdsclient.Pool that represents a pool of xds clients:
    • This will internally contain a map of xds clients keyed by a name value, as described in A71.
    • The xdsclient.Pool will be initialized with a bootstrap configuration that will be used by all xds clients in the pool.
  • Have a global variable named xdsclient.DefaultPool that is set to a new pool at init time
    • This will read the bootstrap env vars to get its bootstrap configuration
  • The existing dial option and server option (to specify bootstrap configuration to be used by the xds client) will be changed to create a new xdsclient.Pool
    • the resolver will check if the dial option is set, and if so, it will use that pool instead of the default pool
    • the server will check if the server option is set, and if so, it will use that pool instead of the default pool

The above approach will ensure that when tests create multiple servers or channels with different bootstrap configs, they each will get their own pools.

The xdsclient.Pool will provide the following methods:

  • NewClient(name)
  • GetClientForTesting(name)
    • This is required by some tests which need to get a ref to the xds client currently in use, to trigger a resource not found error
  • SetFallbackBootstrapConfig(bootstrapContents)
    • This is required by the c2p resolver to specify a bootstrap configuration to use a fallback when the bootstrap env vars are not specified.

We will also have a way to create the pool

  • xdsclient.NewPool(bootstrapContents)

We could also consolidate NewClient(name) and GetClientForTesting(name) with a single method GetOrCreateClient(name).

@easwars
Copy link
Contributor

easwars commented Sep 12, 2024

Contributions are welcome.

@easwars
Copy link
Contributor

easwars commented Sep 12, 2024

Also, the branch for v1.67.0 release has already been cut. So, we should try to get this in by v1.68.0 release.

@howardjohn
Copy link
Contributor Author

Can the change be reverted or fixed in a patch release?

Because of how go module resolution works, we cannot depend on a module that depends on grpc 1.66+ and stay on 1.65. This means we need to chose between picking up critical dependency updates and supporting proxyless gRPC in Istio. Between those two, we would unfortunately be likely pressured to drop support for proxyless gRPC given it is an experimental feature.

@easwars easwars added the Area: xDS Includes everything xDS related, including LB policies used with xDS. label Oct 2, 2024
@easwars
Copy link
Contributor

easwars commented Oct 8, 2024

We are really sorry for breaking you, but we are in the midst of a major refactor in the xDS client codebase to support xDS fallback. I have those changes ready and they are in the process of being reviewed. The changes I outlined here would be too messy to start before the above mentioned refactor is complete.

I'd like to see if there are other options to unblock you in the meantime.

  • Is skipping the failing test for a while an option at all?
  • Is changing the test an option, so that it doesn't spin up multiple servers?
  • I will also check if there is a way for us to provide a short-term workaround.

@howardjohn
Copy link
Contributor Author

Currently we have disabled testing of proxyless gRPC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Bug
Projects
None yet
Development

No branches or pull requests

4 participants