-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
endpointsharding: Skip nil child pickers while updating ClientConn state #7584
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #7584 +/- ##
==========================================
- Coverage 81.91% 81.85% -0.07%
==========================================
Files 361 361
Lines 27825 27832 +7
==========================================
- Hits 22794 22782 -12
- Misses 3841 3853 +12
- Partials 1190 1197 +7
|
ServiceConfig: sc, | ||
}) | ||
|
||
cc, err := grpc.Dial(mr.Scheme()+":///", grpc.WithResolvers(mr), grpc.WithTransportCredentials(insecure.NewCredentials())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use grpc.NewClient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to NewClient
.
defer cancel() | ||
client := testgrpc.NewTestServiceClient(cc) | ||
|
||
// Even though the child LB policy hasn't given an picker updates, it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Even though the child LB policy hasn't given an picker updates, it is assumed that its in connecting state
We should verify that there is no state change i.e. balancer was created with connecting
state and balancer is still in connecting
state. May be check State.ConnectivityState
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't way to get the endpoint sharding balancer's state with the test harness. I've added a check for the channel state instead. The channel state should remain CONNECTING until the endpoint sharding balancer changes it.
Is this just covering up for a child LB policy that isn't following the rules of the API? Should the child policy be required to set its state? Note that we will want some children to start IDLE, specifically we'll probably want to create a wrapper LB policy that creates a single child in IDLE and only starts it when it gets an RPC or an |
What is the panic, exactly? There's no explanation in the PR description and no linked issue here. |
Attached logs to the PR description. The problem is that the child pickers in the child state are not initialized here: grpc-go/balancer/endpointsharding/endpointsharding.go Lines 118 to 122 in ca4865d
When pick is called on endpointsharding's picker, it tries to call grpc-go/balancer/endpointsharding/endpointsharding.go Lines 254 to 258 in ca4865d
|
Are LB policies required to produce a picker when |
But..there are no child pickers yet. It seems like the real problem is that the endpointsharding balancer shouldn't perform any
No, there is no such requirement today.
I don't think that's quite correct. Pickfirst should START in state CONNECTING. It should do so by producing a picker that queues RPCs and sets the state to CONNECTING.
This seems OK as long as a picker has been sent that says CONNECTING. You shouldn't suppress the first update based on the assumption that you started CONNECTING. |
I think this change is actually OK, but I'm concerned. I would be OK with the assumptions here if it's documented in the balancer API and if we audit all parent LB policies that we own to make sure they conform to this behavior. This is not currently documented, however, so it's questionable to propagate this assumption throughout the tree. Also I don't see anything in the channel that does this, so I think it's the gracefulswitch balancer that makes the channel behave this way. But I'm not exactly clear on where we would set the channel state to CONNECTING while we're still waiting for the initial name resolver update. Maybe we aren't, and that's a bug? Or maybe we're initializing more than I was expecting even while waiting for the initial update? |
I filed #7686 to make sure we have a test case for the behavior in the last paragraph I wrote above. |
368ae08
to
e6d5ac6
Compare
@dfawley I've changed the approach to ignore |
es.cc.UpdateState(balancer.State{ | ||
ConnectivityState: connectivity.Connecting, | ||
Picker: base.NewErrPicker(balancer.ErrNoSubConnAvailable), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no children have produced a picker, why are we doing so here?
I think the safest behavior here is to either do what you said before (and go through the exercise to make "initial state == connecting" an API guarantee), or to only have this endpointsharding LB policy produce pickers in sync with its children: i.e. whenever any child updates its picker, this policy updates its combined picker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't send a picker here, the unit test added in the PR indicates that the channel remains in IDLE mode.
Closing this PR, will open it once #7686 is fixed. |
The child pickers are initialized to
nil
when child LB policies are createdgrpc-go/balancer/endpointsharding/endpointsharding.go
Lines 118 to 122 in ca4865d
When pick is called on endpointsharding's picker, it tries to call
Pick()
on a nil picker here and panicsgrpc-go/balancer/endpointsharding/endpointsharding.go
Lines 254 to 258 in ca4865d
This change ignores nil child pickers during aggregation. If all pickers are nil, the channel state is set to CONNECTING with error
balancer.ErrNoSubConnAvailable
.Test logs demonstrating the panic: panic_logs.txt
RELEASE NOTES:
nil
child pickers while updating ClientConn state. This avoid panics when an RPC is made before the child policy generates a picker.