Skip to content

Conversation

@letFunny
Copy link
Collaborator

  • Have you signed the CLA?

This commit introduces the chisel debug check-release-archives command
that verifies that Chisel assumptions avoid the archives hold. For
example, Chisel assumes that parent directories bundled in several
packages in the archive are consistent with one another (i.e. have the
same mode). If the is not the case, we need to flag the issue and
possibly handle it in the slice definitions files.


This PR does not actually check whether the conflicts are handled in the SDF, that will happen in a later PR. The scope is:

  • This PR: report all the issues found in the release's archives.
  • Next PR: do not report issues which are handled in the SDFs.
  • Independent PR: emit a warning on a best-effort basis when we detect the issue at runtime in Chisel.

letFunny added 2 commits May 29, 2025 13:05
This commit introduces the `chisel debug check-release-archives` command
that verifies that Chisel assumptions avoid the archives hold. For
example, Chisel assumes that parent directories bundled in several
packages in the archive are consistent with one another (i.e. have the
same mode). If the is not the case, we need to flag the issue and
possibly handle it in the slice definitions files.
@github-actions
Copy link

github-actions bot commented May 29, 2025

Command Mean [s] Min [s] Max [s] Relative
BASE 8.407 ± 0.021 8.376 8.437 1.00
HEAD 8.430 ± 0.024 8.392 8.475 1.00 ± 0.00

@letFunny letFunny added the Priority Look at me first label Jun 26, 2025
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems quite rushed, Alberto.

// symlinks don't.
path = strings.TrimSuffix(path, "/")

dir := directories[path]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know path is a directory?

How come a list of ownerships is a dir?

I seem to be missing some basics about what the algorithm is supposedly doing, and the naming conventions are not helping.

Copy link
Collaborator Author

@letFunny letFunny Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the code to change the nomenclature to path and to skip regular files completely. It should flag now only "parent-directory-conflicts" which are the ones we care about. There is a small caveat, when we find a conflict with only symlinks we cannot be sure if it is about parent directories. For us to be sure, we would need to look at the link target in the tarball which makes the code much more complex.

Because we have not seen an example of that in the wild, I propose do not overcomplicate the code and if/when we find an example, we can always remove the false positives.

archives[archiveName] = openArchive
}

type ownership struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing about ownership here.

Copy link
Collaborator Author

@letFunny letFunny Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is but maybe the notion is not the correct one. What I wanted to model is that each directory that is extracted has to be owned by a single package (unless explicitly declared which is outside the scope of the PR). Some other ideas come to find with similar semantic meaning like extractedFrom which goes from path to the package that might extract it. I will change the code to use it instead to see if it is more clear.

The only difference is that for each path we want to split it in groups to see if the content is the same or not which is why we need to record Link and Mode.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the code to match the example we discussed where there is no notion of ownership, instead we try to group observations. The only difference is that looking for conflicts is not as simple as noting whether len(observations) > 1, we now have to check for inequality.

}

var issues []any
type parentDirectoryConflict struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the "parent" bit come from? Or the directory bit? The logic above seems to handle every path found in the tarball with no checks.

Copy link
Collaborator Author

@letFunny letFunny Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it comes from the empirical checks but it is better to check that we are in fact correct. The only caveat is that we never know if symlinks point to a file or to a directory and we have to report them anyway, but it is only a conflict if it is a parent directory if not the regular release.validate() logic takes care of it.

EDIT: We will discuss whether we want to make the same assumptions that extract.go does or increase the scope.

type parentDirectoryConflict struct {
Issue string `yaml:"issue"`
Path string `yaml:"path"`
Entries []ownership `yaml:"entries"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ownership type feels quite poor as a public way to display that information, as the sample data in the tests demonstrate. I would design the desired output first, before any code is written.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let's discuss offline and change the spec.

Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, possibly last review:

observations := pathsObservations[path]
hasConflict := false
base := observations[0]
for _, observation := range observations {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... range observations[1:] {

No need to compare it with itself.

This is also a self-contained algorithm, so it may be nicely moved to a helper function, so the code reads:

if hasPathConflict(observations) {
        ...
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will extract it into a function, that makes sense.

found := false
// We look for a previous observation that extracts the same
// content in terms of mode, link, etc. and we add the package
// to it. If there is none, we create a new one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's compute the observation value ahead of time here, and then delegate the finding to a helper function. Something along the lines of:

index, ok := findObservation(observation, observation)
if !ok {
    observations = append(observations, observation)
}

This is the same note as earlier in this PR: we have very long functions with no attempt to break down algorithms into nice small abstractions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed your suggestion and then realized using slices.IndexFunc will make it concise because the search is encapsulated but without the cost of the extra function and at virtually no extra cost in terms of lines. But please tell me if you think the suggestion is better and I will change it, I have no strong opinion either way.

@niemeyer niemeyer assigned cjdcordeiro and unassigned cjdcordeiro Jul 21, 2025
@letFunny letFunny requested a review from niemeyer July 25, 2025 09:13
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Alberto. This is ended up quite nice.

@niemeyer niemeyer merged commit ca37d0a into canonical:main Jul 25, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Priority Look at me first

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants