xdsclient: reduce the number of ADS requests #7473

easwars · 2024-08-01T17:52:14Z

The xDS client watch API looks like this:

grpc-go/xds/internal/xdsclient/client.go

Line 47 in 6fa393c

    
           WatchResource(rType xdsresource.Type, resourceName string, watcher xdsresource.ResourceWatcher) (cancel func())

This means that if a user of the API wants to register watches on N resources, it has to invoke the API N times. This is a common scenario in many use-cases (examples include xDS resolver, CDS LB policy, xDS enabled gRPC server etc).

The watch API implementation inside the xDS client delegates the watch to the appropriate authority here:

grpc-go/xds/internal/xdsclient/authority.go

Line 458 in 6fa393c

    
           func (a *authority) watchResource(rType xdsresource.Type, resourceName string, watcher xdsresource.ResourceWatcher) func() {

This results in the authority asking the underlying transport to send the request out:

grpc-go/xds/internal/xdsclient/authority.go

Line 593 in 6fa393c

a.transport.SendRequest(rType.TypeURL(), resourcesToRequest)

The implement in the Transport queues this request here:

grpc-go/xds/internal/xdsclient/transport/transport.go

Line 258 in 6fa393c

func (t *Transport) SendRequest(url string, resources []string) {

This is eventually processed by the send goroutine and the request is sent out on the wire:

grpc-go/xds/internal/xdsclient/transport/transport.go

Line 362 in 6fa393c

func (t *Transport) send(ctx context.Context) {

So, if we have a case where an entity watching an LDS resource receives a response that contains N RDS resources, it will end up making N watch API calls, one for each resource, and this will eventually result in N requests on the ADS stream. Each of these requests will contain one more resource compared to the previous one. The situation can be worsened in the case where the LDS resource pointed to K RDS resources and was changed to point to N new RDS resources. So, this would end up in K+N requests on the ADS stream.

While this behavior is suboptimal, it needs to be noted that:

there is no correctness issue here
this is still not violating the xDS transport protocol

There are a few things we can try to make this better though:

Support a transaction like semantics in the xDS client API. With this approach, a user would do the following:
- Start transaction with a new API call.
- Register a bunch of watches, and/or unregister a bunch of watches (with the existing APIs).
- Stop the transaction with a new API call.
- The xDS client will wait for the transaction to complete before sending the resources on the wire.
- Cons of this approach:
  - This puts the onus on the caller to not perform any expensive operations including I/O during the time the transaction is open. It also puts the onus on the caller to close the transaction in time. Optionally, the xDS client could treat the transaction as advisory and if it is not closed within a pre-defined amount of time, could decide to send outstanding resource requests on the wire.
  - Increases the API surface of the xDS client.
  - The xDS client is shared between multiple entities within a single gRPC channel. It is also shared across multiple gRPC servers. One misbehaving entity can affect others in this approach.
Avoid any API changes, and limit the changes to the authority and transport packages. With this approach:
- Instead of the authority sending the list of resources to the transport to send out, it will simply send a message to the transport saying something needs to be sent out.
  - The authority already maintains a list of resources requested by watchers
  - The transport already maintains a list of resources sent out on the ADS stream, i.e requested from the server
- So, when the transport sees the message from the authority, it will query the authority to get the most recent list of resources to send out. If the most recent list matches what the transport has already sent out, the operation will be a no-op.

The text was updated successfully, but these errors were encountered:

dfawley · 2024-09-10T16:52:05Z

This should get fixed by @easwars's upcoming refactoring.

easwars · 2024-11-06T19:53:25Z

Fixed by #7701.

easwars added the Type: Feature New features or improvements in behavior label Aug 1, 2024

dfawley assigned easwars Aug 6, 2024

dfawley added the P1 label Aug 6, 2024

easwars removed their assignment Aug 30, 2024

arjan-bal added the Area: xDS Includes everything xDS related, including LB policies used with xDS. label Sep 4, 2024

dfawley assigned easwars Sep 10, 2024

dfawley removed the P1 label Sep 10, 2024

easwars closed this as completed Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xdsclient: reduce the number of ADS requests #7473

xdsclient: reduce the number of ADS requests #7473

easwars commented Aug 1, 2024

dfawley commented Sep 10, 2024

easwars commented Nov 6, 2024

xdsclient: reduce the number of ADS requests #7473

xdsclient: reduce the number of ADS requests #7473

Comments

easwars commented Aug 1, 2024

dfawley commented Sep 10, 2024

easwars commented Nov 6, 2024