-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance SENTINEL FAILOVER to use the FAILOVER command to avoid data loss #1238
base: unstable
Are you sure you want to change the base?
Conversation
Signed-off-by: Binbin <binloveplay1314@qq.com>
This is a PR that was written very quickly (I wrote it at night and i want to push it as soon as possible). There may be problems with the details or it can be optimized. Also see the redis/redis#6062 for more data loss details. The test case also proves data loss locally. |
Signed-off-by: Binbin <binloveplay1314@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks good.
I'm not enough familiar with Sentinel. @hwware you are a sentinel expert. Do you want to take a look?
Signed-off-by: Binbin <binloveplay1314@qq.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #1238 +/- ##
============================================
- Coverage 70.72% 70.64% -0.08%
============================================
Files 114 114
Lines 63150 63195 +45
============================================
- Hits 44660 44642 -18
- Misses 18490 18553 +63
|
Signed-off-by: Binbin <binloveplay1314@qq.com>
Overall, LTGM, just one concern about the test case. |
Signed-off-by: Binbin <binloveplay1314@qq.com>
Some time ago, I proposed to use FAILOVER with SENTINEL FAILOVER in redis, see redis/redis#13118. I implemented this proposal and also ported my first implementation to Valkey some time ago, see gmbnomis@146850a. The main issues I learned about when doing that were: (I think that this PR does not address these)
For reference: my proposal addresses 1. It addresses 2 somewhat (probably a solution seeking a leader election would be safer here) and does not address 3 (I remember that I started working on this but I did not finish it. The idea was to monitor the INFO field "master_failover_state" and send an FAILOVER ABORT if a node is in the state for too long. ) |
thanks for the detail comments.
Yes, we should. I remember this thing but nerver be able to found the documentation on it. thanks for the link.
I did not read the whole story, i will find some times to catch up. Yes it seems you are right, all my thinking is in happy path. Maybe we should use FORCE option?
I support in this, a FORCE option is needed, in this case, the worst case is the same as before.
@gmbnomis i did not read your proposal and implementation before, i will read it later, do you want to pick it up? You have done a lot of work here, it would be great that if you could pick it up, or we can slowly push it forward in the future and make all of this happen. |
Using FAILOVER with the FORCE option would be more similar to the current SENTINEL FAILOVER in that we don't care about the replication state and just fail over at some point in time.
The FORCE option of FAILOVER does not apply to the FAILOVER_IN_PROGRESS state, so we might get stuck as well and need some mechanism to get out of it. (My current thinking is that this does not need to be very efficient because the probability of this happening seems rather low. But the system should be able to recover from such a failure)
@enjoy-binbin I am not completely sure, but you mean to pick up my old proposal and re-start working on it? As there seems to be interest in this feature I'd be happy to do that. (If so, how should I start? Create a new issue with an update of redis/redis#13118 to flesh out the implementation options further?) I am also happy to support getting another implementation into shape. |
Currently, SENTINEL FAILOVER selects a replica to send REPLICAOF NO ONE,
then waits for PROMOTION, and then sends REPLICAOF new_ip new_port to the
old primary and other replicas to complete the failover. The problem here
is that if the old primary has written to it before the role change during
the failover, these writes will be lost.
We can use the FAILOVER command to avoid this data loss. FAILOVER was added
in 0d18a1e, it can coordinates the failover
between the primary and the replica.
Before the original step REPLICAOF NO ONE, we try to send FAILOVER TO ip port
to the primary to go through the FAILOVER process. If the primary does not
support FAILOVER, for example, returns an error, it will fallback to the old
logic, that means fallback to REPLICAOF NO ONE.