-
Notifications
You must be signed in to change notification settings - Fork 53
feat: debug command to check release archives #229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit introduces the `chisel debug check-release-archives` command that verifies that Chisel assumptions avoid the archives hold. For example, Chisel assumes that parent directories bundled in several packages in the archive are consistent with one another (i.e. have the same mode). If the is not the case, we need to flag the issue and possibly handle it in the slice definitions files.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems quite rushed, Alberto.
| // symlinks don't. | ||
| path = strings.TrimSuffix(path, "/") | ||
|
|
||
| dir := directories[path] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know path is a directory?
How come a list of ownerships is a dir?
I seem to be missing some basics about what the algorithm is supposedly doing, and the naming conventions are not helping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the code to change the nomenclature to path and to skip regular files completely. It should flag now only "parent-directory-conflicts" which are the ones we care about. There is a small caveat, when we find a conflict with only symlinks we cannot be sure if it is about parent directories. For us to be sure, we would need to look at the link target in the tarball which makes the code much more complex.
Because we have not seen an example of that in the wild, I propose do not overcomplicate the code and if/when we find an example, we can always remove the false positives.
| archives[archiveName] = openArchive | ||
| } | ||
|
|
||
| type ownership struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing about ownership here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is but maybe the notion is not the correct one. What I wanted to model is that each directory that is extracted has to be owned by a single package (unless explicitly declared which is outside the scope of the PR). Some other ideas come to find with similar semantic meaning like extractedFrom which goes from path to the package that might extract it. I will change the code to use it instead to see if it is more clear.
The only difference is that for each path we want to split it in groups to see if the content is the same or not which is why we need to record Link and Mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the code to match the example we discussed where there is no notion of ownership, instead we try to group observations. The only difference is that looking for conflicts is not as simple as noting whether len(observations) > 1, we now have to check for inequality.
| } | ||
|
|
||
| var issues []any | ||
| type parentDirectoryConflict struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the "parent" bit come from? Or the directory bit? The logic above seems to handle every path found in the tarball with no checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it comes from the empirical checks but it is better to check that we are in fact correct. The only caveat is that we never know if symlinks point to a file or to a directory and we have to report them anyway, but it is only a conflict if it is a parent directory if not the regular release.validate() logic takes care of it.
EDIT: We will discuss whether we want to make the same assumptions that extract.go does or increase the scope.
| type parentDirectoryConflict struct { | ||
| Issue string `yaml:"issue"` | ||
| Path string `yaml:"path"` | ||
| Entries []ownership `yaml:"entries"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ownership type feels quite poor as a public way to display that information, as the sample data in the tests demonstrate. I would design the desired output first, before any code is written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, let's discuss offline and change the spec.
Minus the change to "path-conflict" which I think loses specificity and makes the documentation harder than what it should be considering we only have defined one type of conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, possibly last review:
| observations := pathsObservations[path] | ||
| hasConflict := false | ||
| base := observations[0] | ||
| for _, observation := range observations { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... range observations[1:] {
No need to compare it with itself.
This is also a self-contained algorithm, so it may be nicely moved to a helper function, so the code reads:
if hasPathConflict(observations) {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will extract it into a function, that makes sense.
| found := false | ||
| // We look for a previous observation that extracts the same | ||
| // content in terms of mode, link, etc. and we add the package | ||
| // to it. If there is none, we create a new one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's compute the observation value ahead of time here, and then delegate the finding to a helper function. Something along the lines of:
index, ok := findObservation(observation, observation)
if !ok {
observations = append(observations, observation)
}
This is the same note as earlier in this PR: we have very long functions with no attempt to break down algorithms into nice small abstractions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed your suggestion and then realized using slices.IndexFunc will make it concise because the search is encapsulated but without the cost of the extra function and at virtually no extra cost in terms of lines. But please tell me if you think the suggestion is better and I will change it, I have no strong opinion either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Alberto. This is ended up quite nice.
This commit introduces the
chisel debug check-release-archivescommandthat verifies that Chisel assumptions avoid the archives hold. For
example, Chisel assumes that parent directories bundled in several
packages in the archive are consistent with one another (i.e. have the
same mode). If the is not the case, we need to flag the issue and
possibly handle it in the slice definitions files.
This PR does not actually check whether the conflicts are handled in the SDF, that will happen in a later PR. The scope is: