Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure consistent file permissions on broken WAL files #18574

Conversation

lucasrod16
Copy link
Contributor

Fixes #18571

@k8s-ci-robot
Copy link

Hi @lucasrod16. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jmhbnz
Copy link
Member

jmhbnz commented Sep 11, 2024

/ok-to-test

@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.81%. Comparing base (7f399ee) to head (643e870).

Current head 643e870 differs from pull request most recent head d12bb7e

Please upload reports for the commit d12bb7e to get more accurate results.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
Files with missing lines Coverage Δ
server/storage/wal/repair.go 56.14% <100.00%> (ø)

... and 27 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #18574      +/-   ##
==========================================
+ Coverage   68.78%   68.81%   +0.02%     
==========================================
  Files         420      420              
  Lines       35474    35474              
==========================================
+ Hits        24402    24411       +9     
+ Misses       9652     9643       -9     
  Partials     1420     1420              

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f399ee...d12bb7e. Read the comment docs.

@@ -67,7 +67,7 @@ func Repair(lg *zap.Logger, dirpath string) bool {

case errors.Is(err, io.ErrUnexpectedEOF):
brokenName := f.Name() + ".broken"
bf, bferr := os.Create(brokenName)
bf, bferr := os.OpenFile(brokenName, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, fileutil.PrivateFileMode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention consistency in the PR name, can you maybe look into other places that create WAL file and propose how it can be reused and unified allowing for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my observations:

In wal.Create, fileutil.LockFile(p, os.O_WRONLY|os.O_CREATE, fileutil.PrivateFileMode) is used to bootstrap a new WAL file:

f, err := fileutil.LockFile(p, os.O_WRONLY|os.O_CREATE, fileutil.PrivateFileMode)

and in alloc(), fileutil.LockFile(fpath, os.O_CREATE|os.O_WRONLY, fileutil.PrivateFileMode) is also used to create/cut a new WAL file:

if f, err = fileutil.LockFile(fpath, os.O_CREATE|os.O_WRONLY, fileutil.PrivateFileMode); err != nil {

It seems to make sense to use the same method of creating the backup WAL file in Repair as well.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could encapsulate the call to fileutil.LockFile(brokenName, os.O_WRONLY|os.O_CREATE, fileutil.PrivateFileMode) in a createNewWALFile function to promote reuse and maintain consistency. Of course, whether to add this level of abstraction is up to you and your preference for abstraction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think it's the same, the LockFile function not only creates file but takes a OS lock on it. Maybe what I proposed doesn't make sense here as we need a function that would would need different options for

  • Creating a normal file or a locked file (fileutil.LockFile vs os.OpenFile)
  • Create a new file or truncating existing file. (os.O_WRONLY|os.O_CREATE vs os.O_CREATE|os.O_WRONLY|os.O_TRUNC)

It might be useful to have a single function to create WAL files to show all the options, however I'm not sure that it would be beneficial. cc @ahrtr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to have a single function to create WAL files to show all the options, however I'm not sure that it would be beneficial. cc @ahrtr

I am open to do such enhancement, so as to make it less error prone. But I suggest to only fix it per #18574 (comment) in this PR, to keep it as simple as possible, and we can do the enhancement in a separate PR, which definitely needs more careful review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'll follow up with a separate PR for a WAL creation function once this is merged

@lucasrod16 lucasrod16 closed this Sep 11, 2024
@lucasrod16 lucasrod16 force-pushed the 18571-ensure-consistent-permissions-for-broken-WAL-files branch from 77f12f5 to c7b536c Compare September 11, 2024 17:10
@lucasrod16 lucasrod16 reopened this Sep 11, 2024
@lucasrod16
Copy link
Contributor Author

/retest-required

@ivanvc
Copy link
Member

ivanvc commented Sep 12, 2024

Hi, @lucasrod16. Thanks for your pull request. Could you please rebase it? The govulncheck job is failing because of a recent update to Go in the main branch. Rebasing should make the CI green. Thanks!

@lucasrod16 lucasrod16 force-pushed the 18571-ensure-consistent-permissions-for-broken-WAL-files branch from 47e62ac to 2e9998a Compare September 12, 2024 22:56
@lucasrod16
Copy link
Contributor Author

Hi, @lucasrod16. Thanks for your pull request. Could you please rebase it? The govulncheck job is failing because of a recent update to Go in the main branch. Rebasing should make the CI green. Thanks!

Hi @ivanvc, thanks for the heads up! I've rebased the branch, and it should be good now.

server/storage/wal/repair.go Outdated Show resolved Hide resolved
@lucasrod16 lucasrod16 force-pushed the 18571-ensure-consistent-permissions-for-broken-WAL-files branch from 2e9998a to 0c3b10b Compare September 13, 2024 18:49
@lucasrod16
Copy link
Contributor Author

First time I've seen the e2e-386 job fail on this PR. I opened an issue for the test flake: #18585

@lucasrod16
Copy link
Contributor Author

/retest

2 similar comments
@lucasrod16
Copy link
Contributor Author

/retest

@lucasrod16
Copy link
Contributor Author

/retest

@@ -67,7 +67,7 @@ func Repair(lg *zap.Logger, dirpath string) bool {

case errors.Is(err, io.ErrUnexpectedEOF):
brokenName := f.Name() + ".broken"
bf, bferr := os.Create(brokenName)
bf, bferr := os.OpenFile(brokenName, os.O_WRONLY|os.O_CREATE, fileutil.PrivateFileMode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removed O_TRUNC from the file open flags, is there any chance for file collision? cc @ahrtr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, let's add os.O_TRUNC.

We backup the whole wal file to a .broken file, so we should truncate the existing file if present (unlikely in practice).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added os.O_TRUNC back

Signed-off-by: Lucas Rodriguez <lucas.rodriguez9616@gmail.com>
@lucasrod16 lucasrod16 force-pushed the 18571-ensure-consistent-permissions-for-broken-WAL-files branch from 0c3b10b to d12bb7e Compare September 16, 2024 19:07
@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, lucasrod16, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@serathius serathius merged commit 2ed418c into etcd-io:main Sep 17, 2024
43 checks passed
@lucasrod16 lucasrod16 deleted the 18571-ensure-consistent-permissions-for-broken-WAL-files branch September 17, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Inconsistent WAL file permissions on broken files
7 participants