add readIndex check in readyz #16792

siyuanfoundation · 2023-10-17T23:02:38Z

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

part of the work for #16007

chaochn47 · 2023-10-18T00:00:51Z

server/etcdserver/v3_server.go

+	}
+}
+
+func (s *EtcdServer) GetReadIndex(ctx context.Context) (uint64, error) {


Can we re-use LinearizableReadNotify to achieve the same semantic of readIndex?

I think this is more clear even though there is some duplicate code.

@siyuanfoundation Is there any counter argument of this above comment? Could you please clarify?

GetReadIndex and LinearizableReadNotify sends messages to different channels, and waits on different notifiers. The former needs a index returned from the notifier, while the latter only needs the err.

oh, misread that the new readIndexLoop won't wait for the applied index catch up.

Right after read index is received in linearizableReadLoop, readIndexNotifier would notify ReadIndex that listens on the channel. I think that simplifies the implementation, wdyt?

In this way, we can reduce the new background go routine readIndexLoop, avoid readMu lock twice and don't need to add index into the notifier struct.

serathius · 2023-10-18T08:06:15Z

Please merge the e2e tests before this.

chaochn47 · 2023-10-19T17:56:57Z

@siyuanfoundation DCO check failed. Understood the corrupt alarm e2e test was authored by @serathius but you have to sign off this commit to pass the check.

git rebase HEAD~2 --signoff

chaochn47 · 2023-10-19T18:00:16Z

Please merge the e2e tests before this.

@serathius Did you mean separating the e2e test to another PR for validating existing livez / readyz checks behaviors?

ReadIndex check changes and related test case could stay in the current PR, correct?

/cc @siyuanfoundation

chaochn47 · 2023-10-30T22:07:22Z

Please merge the e2e tests before this.

@serathius Did you mean separating the e2e test to another PR for validating existing livez / readyz checks behaviors?

ReadIndex check changes and related test case could stay in the current PR, correct?

/cc @siyuanfoundation

@siyuanfoundation Existing livez readyz e2e test cases are added. Could you please rebase on top of main and resolve the comments?

Signed-off-by: Siyuan Zhang <sizhang@google.com>

chaochn47 · 2023-11-01T04:04:34Z

server/etcdserver/api/etcdhttp/health_test.go

+		{
+			name:             "Alive ok",
+			healthCheckURL:   "/livez",
+			readIndexError:   fmt.Errorf("failed to get read index"),


remove this error?

This error is here is to test livez is ok even when server cannot get read index.

chaochn47 · 2023-11-01T04:19:45Z

server/etcdserver/v3_server.go

+	}
+}
+
+func (s *EtcdServer) GetReadIndex(ctx context.Context) (uint64, error) {


oh, misread that the new readIndexLoop won't wait for the applied index catch up.

Right after read index is received in linearizableReadLoop, readIndexNotifier would notify ReadIndex that listens on the channel. I think that simplifies the implementation, wdyt?

In this way, we can reduce the new background go routine readIndexLoop, avoid readMu lock twice and don't need to add index into the notifier struct.

siyuanfoundation · 2023-11-01T15:58:29Z

@chaochn47
The problem with sharing the same loop is a slow apply loop would block readIndex request when there is a pending read, which makes it not much better than just using linearizableRead. So readIndex and linearizableRead has to be in separate loops.

chaochn47 · 2023-11-01T20:45:11Z

server/etcdserver/v3_server.go

+		}
+
+		nextnr := newNotifier()
+		s.readMu.Lock()


How about adding a separate lock dedicated for read index so adding this loop has no impact on the linearizableReadLoop?

chaochn47 · 2023-11-01T20:51:20Z

server/etcdserver/v3_server.go

+	nc := s.readIndexNotifier
+	s.readMu.RUnlock()
+
+	// signal linearizable loop for current notify if it hasn't been already


Suggested change

// signal linearizable loop for current notify if it hasn't been already

// signal readIndexLoop for current notify if it hasn't been already

chaochn47 · 2023-11-01T20:53:19Z

server/etcdserver/util.go

+	// pass some values in the notifier
+	uint64Val uint64


Suggested change

// pass some values in the notifier

uint64Val uint64

readIndex uint64

It's an unexported pkg local struct, I think explicitly naming it is better than generic.

notifier was meant to be generic. Not sure if we should add fields here.

chaochn47 · 2023-11-01T21:02:08Z

@chaochn47 The problem with sharing the same loop is a slow apply loop would block readIndex request when there is a pending read, which makes it not much better than just using linearizableRead.

Makes sense.

So readIndex and linearizableRead has to be in separate loops.

It may not be necessary if requestCurrentIndex can be used for read index check while the current server side 7s timeout is too long for the check. This function is likely to be refactored if we go down this route.

serathius · 2023-11-02T08:50:45Z

tests/e2e/http_health_check_test.go

+						`[+]serializable_read ok`,
+						`[+]data_corruption ok`,


We don't need to validate other checks, just read_index.

serathius · 2023-11-02T08:54:57Z

server/etcdserver/v3_server.go

@@ -799,8 +794,9 @@ func (s *EtcdServer) linearizableReadLoop() {
 		nr := s.readNotifier
 		s.readNotifier = nextnr
 		s.readMu.Unlock()
-
-		confirmedIndex, err := s.requestCurrentIndex(leaderChangedNotifier, requestId)
+		ctx, cancel := context.WithTimeout(context.Background(), s.Cfg.ReqTimeout())


Why add timeout here?

serathius · 2023-11-02T09:04:10Z

server/etcdserver/api/etcdhttp/health.go

+
+func readIndexCheck(srv ServerHealth) func(ctx context.Context) error {
+	return func(ctx context.Context) error {
+		_, err := srv.GetReadIndex(ctx)


Changing a critical loop like linearizableReadLoop seems like a bad idea for backport. What's wrong with directly just calling requestCurrentIndex?

Just need to have a public function.

func (s *EtcdServer) RequestCurrentIndex() (uint64, error) { requestId := s.reqIDGen.Next() leaderChangedNotifier := s.leaderChanged.Receive() retun s.requestCurrentIndex(leaderChangedNotifier, requestId) }

What do you think?

I think it would problematic to use requestCurrentIndex, because each health check would start a new instance of requestCurrentIndex. In that function, the index responses are read from the single readStateC channel, and if there are multiple requestCurrentIndex instances running, some calls may never get to see the response.

Even though the requestId is unique, but requestCurrentIndex ignores the response if id does not match. So if the response falls to a different requestCurrentIndex , the right owner would not see it.

Oh, I see. requestCurrentIndex was designed to be only called from single routine like linearizableReadLoop. Nice catch!

Let me think about this more. Need to think on how to minimize the change to linearizableReadLoop for a safer backport.
cc @ahrtr for their ideas

k8s-ci-robot · 2024-06-10T13:02:57Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2024-08-05T22:31:52Z

@siyuanfoundation: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-etcd-verify	`e998d9a`	link	true	`/test pull-etcd-verify`
pull-etcd-unit-test-amd64	`e998d9a`	link	true	`/test pull-etcd-unit-test-amd64`
pull-etcd-unit-test-arm64	`e998d9a`	link	true	`/test pull-etcd-unit-test-arm64`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

siyuanfoundation marked this pull request as draft October 17, 2023 23:02

chaochn47 reviewed Oct 18, 2023

View reviewed changes

siyuanfoundation force-pushed the readyz-e2e branch from 2461ffd to b82cd28 Compare October 19, 2023 16:43

chaochn47 mentioned this pull request Oct 26, 2023

add livez readyz e2e tests #16835

Merged

etcdserver: add readyz check for the read index

e998d9a

Signed-off-by: Siyuan Zhang <sizhang@google.com>

siyuanfoundation force-pushed the readyz-e2e branch from b82cd28 to e998d9a Compare October 30, 2023 23:03

siyuanfoundation marked this pull request as ready for review October 30, 2023 23:10

chaochn47 reviewed Nov 1, 2023

View reviewed changes

serathius reviewed Nov 2, 2023

View reviewed changes

siyuanfoundation requested a review from serathius November 13, 2023 22:10

serathius mentioned this pull request Nov 20, 2023

Ignore old leader's leases revoking request #16822

Merged

siyuanfoundation mentioned this pull request Nov 20, 2023

etcdserver: add linearizable_read check to readyz. #16984

Merged

siyuanfoundation marked this pull request as draft November 20, 2023 18:33

k8s-ci-robot added the needs-rebase label Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add readIndex check in readyz #16792

add readIndex check in readyz #16792

siyuanfoundation commented Oct 17, 2023

chaochn47 Oct 18, 2023 •

edited

Loading

siyuanfoundation Oct 30, 2023

chaochn47 Oct 31, 2023

siyuanfoundation Oct 31, 2023

chaochn47 Nov 1, 2023 •

edited

Loading

serathius commented Oct 18, 2023

chaochn47 commented Oct 19, 2023 •

edited

Loading

chaochn47 commented Oct 19, 2023

chaochn47 commented Oct 30, 2023

chaochn47 Nov 1, 2023

siyuanfoundation Nov 1, 2023

chaochn47 Nov 1, 2023 •

edited

Loading

siyuanfoundation commented Nov 1, 2023

chaochn47 Nov 1, 2023

chaochn47 Nov 1, 2023

chaochn47 Nov 1, 2023

serathius Nov 2, 2023

chaochn47 commented Nov 1, 2023

serathius Nov 2, 2023

serathius Nov 2, 2023

serathius Nov 2, 2023

serathius Nov 2, 2023

siyuanfoundation Nov 2, 2023

serathius Nov 15, 2023

k8s-ci-robot commented Jun 10, 2024

k8s-ci-robot commented Aug 5, 2024

	// signal linearizable loop for current notify if it hasn't been already
	// signal readIndexLoop for current notify if it hasn't been already

	// pass some values in the notifier
	uint64Val uint64
	readIndex uint64

add readIndex check in readyz #16792

Are you sure you want to change the base?

add readIndex check in readyz #16792

Conversation

siyuanfoundation commented Oct 17, 2023

chaochn47 Oct 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chaochn47 Nov 1, 2023 • edited Loading

Choose a reason for hiding this comment

serathius commented Oct 18, 2023

chaochn47 commented Oct 19, 2023 • edited Loading

chaochn47 commented Oct 19, 2023

chaochn47 commented Oct 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chaochn47 Nov 1, 2023 • edited Loading

Choose a reason for hiding this comment

siyuanfoundation commented Nov 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chaochn47 commented Nov 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jun 10, 2024

k8s-ci-robot commented Aug 5, 2024

chaochn47 Oct 18, 2023 •

edited

Loading

chaochn47 Nov 1, 2023 •

edited

Loading

chaochn47 commented Oct 19, 2023 •

edited

Loading

chaochn47 Nov 1, 2023 •

edited

Loading