-
Notifications
You must be signed in to change notification settings - Fork 72
feat: TruncateFront & TruncateBack support truncating all entries #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This looks good on the surface. |
I'll add some tests, might also adjust current test cases if needed. |
|
While I was writing tests, a question raised that needs your help to answer. With this change, the WAL may be in a state without any entries during use. As a result, the behavior of In the following two comments, these two functions are expected to return 0 when the log is empty. Lines 484 to 486 in 56126d5
Lines 502 to 504 in 56126d5
However, if the WAL is in a continuous read/write state, returning 0 might cause some confusion for users. Let's temporarily ignore the special handling of empty entries in these two functions and consider the following usage scenario: There are two Go routines—one continuously appends entries to the WAL, while the other continuously retrieves entries from the WAL. The code is as follows: func main() {
notify := make(chan struct{}, 1)
l, _ := wal.Open("wal", nil)
go func() {
for {
time.Sleep(1 * time.Second)
idx, _ := l.LastIndex()
idx++
l.Write(idx, []byte(strconv.FormatUint(idx, 10)+"hello world"))
select {
case notify <- struct{}{}:
default:
}
}
}()
go func() {
for {
select {
case <-notify:
}
for {
idx, _ := l.FirstIndex()
data, err := l.Read(idx)
if errors.Is(err, wal.ErrNotFound) {
break
}
fmt.Println(string(data))
l.TruncateFront(idx + 1)
}
}
}()
select {}
}Relatively speaking, this is a straightforward implementation of the producer-consumer pattern, where each Go routine only needs to focus on one end of the queue. In this case, Now, returning to the special handling of empty entries in these functions, the above code is actually just one step away from running perfectly: Since idx, _ := l.FirstIndex()
if idx == 0 {
idx = 1
}This isn't particularly complicated, but it may confuse users until they carefully read the source code of Currently, the implementation of these two functions only includes special handling for one specific case: when For Disregarding historical reasons, a feasible implementation would be to return an additional boolean value to indicate whether the WAL is empty. So the key question that needs to be clarified is:
|
|
Sorry for the slow response. I think the best route is to always retain a non-zero index after the truncate operation. For For The problem with this solution is that clearing the log leaves the first/last indexes in a strange overlapping state. I think this could be considerer a breaking change, at least for those users who expect that |
|
Glad to hear your reply! I will add the feature option and test case on Monday. And a small mistake here:
The Lines 806 to 808 in 56126d5
If provided index equals to FirstIndex, then it will keep the first entry instead of no entry.Anyway, it doesn't change the conclusion. |
|
Sounds good. Thanks! |
|
Added feature option I didn't use |
|
I think the name |
|
After further thought, I'm not actually sure the AllowEmpty option is needed. Initially I suggested it out of an abundance of caution but I really can't see a scenario where it'll affect the current users. I said
But in reality, the only users that would expect the first index to be zero are those who just created a brand new log. |
|
That makes sense, should I remove this option or set I've also investigated WAL usage in https://github.com/open-telemetry/opentelemetry-collector-contrib and https://github.com/influxdata/telegraf . Both of them can handle For https://github.com/open-telemetry/opentelemetry-collector-contrib, it uses its own index counter and prevents overlapping with a // In normal state, wIndex and rIndex will differ by one. To avoid having -1 as a final value, we set it to 0 as minimum.
lag := max(0, int64(prweWAL.wWALIndex.Load()-prweWAL.rWALIndex.Load()))
prweWAL.telemetry.recordWALLag(ctx, lag)For https://github.com/influxdata/telegraf, empty file is handled by itself, and expected to be simplified once WAL supports empty state(https://github.com/influxdata/telegraf/pull/17240/files#r2171642847): // This is very messy and not ideal, but serves as the only way I can find currently
// to actually treat the walfile as empty if needed, since Truncate() calls require
// that at least one entry remains in them otherwise they return an error.
// Related issue: https://github.com/tidwall/wal/issues/20
func (b *DiskBuffer) handleEmptyFile() {
if !b.isEmpty {
return
}
if err := b.file.TruncateFront(b.readIndex() + 1); err != nil {
log.Printf("E! readIndex: %d, buffer len: %d", b.readIndex(), b.length())
panic(err)
}
b.mask = b.mask[1:]
b.isEmpty = false
} |
|
Thanks for doing the additional research. I think we can remove the option altogether. |
|
Option removed, along with a couple of redundant if conditions. I think these changes complete the feature PR, please remind me if anything is missing. I suggest to make a release statement for next version tag, since it should still be considered as a breaking change.
|
|
I'm checking on my side, and I want to see if we can avoid all breaking changes before merging. |
|
Overall the PR works. But I was wrong about removing the AllowEmpty option. Without the option this change does break some existing projects that use this package. I tested with a few additional external projects that import this package and ran into some failed tests, including my own https://github.com/tidwall/raft-wal project. 🤦♂️ So I added the option back. I also added an IsEmpty function along with some new tests and refactoring. I'll merge this PR and then follow it up with my changes. |
This commit includes the AllowEmpty option which allows for deleting all entries from the log using the TruncateFront and TruncateBack methods. Using AllowEmpty changes the behavior of the log as it relates to FirstIndex and LastIndex in the following ways: - An empty log will now always have the FirstIndex be equal to LastIndex+1. Witout AllowEmpty, FirstIndex can never be greater than LastIndex. - For a newly created log that has no entries, FirstIndex() and LastIndex() return 1 and 0, respectively. Without AllowEmpty, both return 0. Also in this commit: - IsEmpty method, which returns true if the log has no entries - More unit tests - Various code cleanup See #31
|
Ok all pushed and a new release was created. 😅 |
|
Thank you for your great package! |
Currently, it seems like there is no way to truncate entire log, which causes the duplicate transaction issue (#20).
This PR provides the ability of truncating all entries, by extending the index range 1 step further than the last(TruncateFront) or 1 step before the first(TruncateBack) entry.
For example, if the log has 3 entries
[10, 11, 12]:TruncateFront(13)will truncate all entries, and leftfirstIndexas 13 andlastIndexas 12.TruncateBack(9)will truncate all entries, and leftfirstIndexas 10 andlastIndexas 9.Why this feature is needed
While I was working on a project that uses this package as WAL implementation, I've met the situation that described in #20. I can't simply mark all jobs as done if the program exits unexpectedly, and next time it starts, it will duplicate the last job. So I need a way to remove this last job from wal once it's done.
With this PR, I can call
TruncateFront(index + 1)when I've poppedindexth job from wal, and let the program wait for incomingindex + 1th job.What's the difference between #29 and this PR
Thanks to @DStrand1 for providing a solution. However, #29 can't fulfill my requirements, by providing a
Clearfunction.Complexity of usage
In my use case, I'm running a long-lived goroutine as a consumer(
Read), and a gin server as producers(Write).With
Clearfrom #29, after the consumer finishes all jobs, it will first detect whether wal is empty (by either callingRead(index + 1)withErrNotFoundreturned, or checking a seperate counter), and then callClearto remove all entries. The problem is that, these two steps are not atomic, so I have to add another lock to protect them which is complicated.With this PR, both
TruncateFrontandTruncateBackhold internal lock, so they are atomic. And caller don't need to care about how many jobs left in wal, just callReadandTruncateFrontin a loop untilReadreturnsErrNotFound, then wait for incoming jobs.Corruption-safty
Clearfunction in #29 isn't corruption-safe, like #29 (comment) pointed out.This PR modifies
TruncateFrontandTruncateBack, following the original corruption-safe mechanism.Index continuity
#29 clears all entries, also resets
firstIndexandlastIndexto 1 and 0. BothWriteandReadhave to reset its own index counter afterClear.This PR keeps
firstIndexandlastIndexcontinuous,WriteandReadcan keep using previous index counter.