-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure consistent file permissions on broken WAL files #18574
Ensure consistent file permissions on broken WAL files #18574
Conversation
Hi @lucasrod16. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files
... and 27 files with indirect coverage changes @@ Coverage Diff @@
## main #18574 +/- ##
==========================================
+ Coverage 68.78% 68.81% +0.02%
==========================================
Files 420 420
Lines 35474 35474
==========================================
+ Hits 24402 24411 +9
+ Misses 9652 9643 -9
Partials 1420 1420 Continue to review full report in Codecov by Sentry.
|
server/storage/wal/repair.go
Outdated
@@ -67,7 +67,7 @@ func Repair(lg *zap.Logger, dirpath string) bool { | |||
|
|||
case errors.Is(err, io.ErrUnexpectedEOF): | |||
brokenName := f.Name() + ".broken" | |||
bf, bferr := os.Create(brokenName) | |||
bf, bferr := os.OpenFile(brokenName, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, fileutil.PrivateFileMode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mention consistency in the PR name, can you maybe look into other places that create WAL file and propose how it can be reused and unified allowing for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my observations:
In wal.Create
, fileutil.LockFile(p, os.O_WRONLY|os.O_CREATE, fileutil.PrivateFileMode)
is used to bootstrap a new WAL file:
etcd/server/storage/wal/wal.go
Line 129 in 5088131
f, err := fileutil.LockFile(p, os.O_WRONLY|os.O_CREATE, fileutil.PrivateFileMode) |
and in alloc()
, fileutil.LockFile(fpath, os.O_CREATE|os.O_WRONLY, fileutil.PrivateFileMode)
is also used to create/cut a new WAL file:
etcd/server/storage/wal/file_pipeline.go
Line 78 in 5088131
if f, err = fileutil.LockFile(fpath, os.O_CREATE|os.O_WRONLY, fileutil.PrivateFileMode); err != nil { |
It seems to make sense to use the same method of creating the backup WAL file in Repair
as well.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could encapsulate the call to fileutil.LockFile(brokenName, os.O_WRONLY|os.O_CREATE, fileutil.PrivateFileMode)
in a createNewWALFile
function to promote reuse and maintain consistency. Of course, whether to add this level of abstraction is up to you and your preference for abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think it's the same, the LockFile function not only creates file but takes a OS lock on it. Maybe what I proposed doesn't make sense here as we need a function that would would need different options for
- Creating a normal file or a locked file (fileutil.LockFile vs os.OpenFile)
- Create a new file or truncating existing file. (os.O_WRONLY|os.O_CREATE vs os.O_CREATE|os.O_WRONLY|os.O_TRUNC)
It might be useful to have a single function to create WAL files to show all the options, however I'm not sure that it would be beneficial. cc @ahrtr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to have a single function to create WAL files to show all the options, however I'm not sure that it would be beneficial. cc @ahrtr
I am open to do such enhancement, so as to make it less error prone. But I suggest to only fix it per #18574 (comment) in this PR, to keep it as simple as possible, and we can do the enhancement in a separate PR, which definitely needs more careful review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I'll follow up with a separate PR for a WAL creation function once this is merged
77f12f5
to
c7b536c
Compare
/retest-required |
Hi, @lucasrod16. Thanks for your pull request. Could you please rebase it? The govulncheck job is failing because of a recent update to Go in the main branch. Rebasing should make the CI green. Thanks! |
47e62ac
to
2e9998a
Compare
Hi @ivanvc, thanks for the heads up! I've rebased the branch, and it should be good now. |
2e9998a
to
0c3b10b
Compare
First time I've seen the e2e-386 job fail on this PR. I opened an issue for the test flake: #18585 |
/retest |
2 similar comments
/retest |
/retest |
server/storage/wal/repair.go
Outdated
@@ -67,7 +67,7 @@ func Repair(lg *zap.Logger, dirpath string) bool { | |||
|
|||
case errors.Is(err, io.ErrUnexpectedEOF): | |||
brokenName := f.Name() + ".broken" | |||
bf, bferr := os.Create(brokenName) | |||
bf, bferr := os.OpenFile(brokenName, os.O_WRONLY|os.O_CREATE, fileutil.PrivateFileMode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removed O_TRUNC
from the file open flags, is there any chance for file collision? cc @ahrtr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, let's add os.O_TRUNC
.
We backup the whole wal file to a .broken file, so we should truncate the existing file if present (unlikely in practice).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added os.O_TRUNC
back
Signed-off-by: Lucas Rodriguez <lucas.rodriguez9616@gmail.com>
0c3b10b
to
d12bb7e
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahrtr, lucasrod16, serathius The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes #18571