Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I make a BXL script depend on file contents? #748

Closed
cbarrete opened this issue Aug 23, 2024 · 3 comments
Closed

How do I make a BXL script depend on file contents? #748

cbarrete opened this issue Aug 23, 2024 · 3 comments

Comments

@cbarrete
Copy link
Contributor

I'm writing a BXL script for linting C++ files with clang-tidy. The script itself is very simple:

def _clang_tidy_impl(ctx: BxlContext):
    actions = ctx.bxl_actions().actions

    # Each target expression expands to a list of targets, so we flatten the
    # input into a flat list of targets.
    targets = []
    for t in ctx.cli_args.targets:
        targets += t

    files = ctx.uquery().inputs(targets)
    cpp_files = []
    for file in files:
        path = file.path
        if path.endswith(".cpp") or path.endswith(".c") or path.endswith(".cc") or path.endswith(".cxx"):
            cpp_files.append(path)

    clang_tidy = ctx.analysis(":clang-tidy").providers()[RunInfo]
    report_file = actions.declare_output("clang_tidy_report.yaml")
    script = actions.write(
        "run_clang_tidy.sh",
        [
            "set -eu",
            cmd_args(
                clang_tidy,
                cpp_files,
                "--export-fixes",
                report_file.as_output(),
                delimiter = " ",
            ),
            # If there are no warnings to report, clang-tidy will not produce an output.
            cmd_args(
                "touch",
                report_file.as_output(),
                delimiter = " ",
            ),
        ],
        with_inputs = True,
        is_executable = True,
    )

    actions.run(
        cmd_args(["/bin/sh", script]),
        category = "clang_tidy",
    )
    ctx.output.ensure(report_file)

# buck2 bxl clang-tidy.bxl:clang_tidy -c build.execution_platforms=some:platform -- --targets demo/...
clang_tidy = bxl_main(
    impl = _clang_tidy_impl,
    cli_args = {
        "targets": cli_args.list(cli_args.target_expr()),
    },
)

(I left out some irrelevant details)

This works well enough as a prototype, except that successes are cached even if I edit a source file and add an error on purpose. I get why this is happening: the build graph doesn't change, so my uquery is cached, so the whole script is cached.

I've tried to add dependencies wherever I could via cmd_args' hidden parameter, but no dice (heh). Buck nags me with a

[2024-08-23T15:34:15.608-04:00] File changed: root//demo/lib.h

but won't let me easily depend on that change.

Is there a way that I can someone add a dependency on the contents of the input files that doesn't involve e.g. reading them via a dynamic action and some hashing?

@cbarrete cbarrete mentioned this issue Aug 29, 2024
@cormacrelf
Copy link
Contributor

Main thing: uqueryctx.inputs() gives you a file_set, which if you look in the docs, is just an iterable set of bxl.FileNode. These aren't artifacts, and you're just collecting their paths as strings and writing them into a file, so you can't expect Buck to track them. You need to turn them into source artifacts using ctx.fs.source(node) before you can use them as such.

More things:

  1. Seems that BXL's actions.write does not really support with_inputs = True, even though it has the flag, it doesn't do anything.
  2. BXL also doesn't seem to really implement allow_args = True, even though it has the flag and is documented. The returned list of input artifacts always seems to be empty.
  3. Be aware that buck will print File changed: ... for literally any file that changes, including ones that are not referenced anywhere. Doesn't really mean anything.
  4. This is a bit in the weeds, but generally I wouldn't write the list of files into the script itself. That means buck needs to write a new copy of the script each time the file list changes, i.e. every time you try to clang-tidy a different target. Probably just pass em as args to the script.
  5. And since this is clang-tidy, you may want to do one actions.run per target, so you can configure clang-tidy with include paths from the target in question, for example. Perhaps this is what you have omitted for simplicity. But worth pointing out you probably want ctx.output.ensure_multiple if you're doing that.

I am running a bit of an old Buck2. Maybe those first two BXL issues have been fixed since.

All in all, this works:

def _cat_files_impls(ctx: BxlContext):
    actions = ctx.bxl_actions().actions

    # Each target expression expands to a list of targets, so we flatten the
    # input into a flat list of targets.
    targets = []
    for t in ctx.cli_args.targets:
        targets += t

    # The inputs() returns a file_set, which is a set of bxl.FileNode.
    # We need these as source artifacts.
    files = [ctx.fs.source(file) for file in ctx.uquery().inputs(targets)]

    report_file = actions.declare_output("all.txt")
    script = actions.write(
        "cat_files.sh",
        [
            "OUTPUT=\"$1\"",
            "shift",
            'cat "$@" > "$OUTPUT"',
        ],
        is_executable = True,
    )

    actions.run(
        cmd_args("/bin/sh", script, report_file.as_output(), files),
        category = "cat_files",
    )
    ensured = ctx.output.ensure(report_file)
    ctx.output.print(ensured)

# buck2 bxl cat_files.bxl:cat_files -- --targets thing//:main | xargs cat
cat_files = bxl_main(
    impl = _cat_files_impls,
    cli_args = {
        "targets": cli_args.list(cli_args.target_expr()),
    },
)
; buck2 bxl //thing/cat_files.bxl:cat_files -- --targets //thing:main | xargs cat
...
int main() { return 0; }
void thing() {}

; buck2 bxl //thing/cat_files.bxl:cat_files -- --targets //thing:main | xargs cat
File changed: root//thing/thing.c
...
BXL SUCCEEDED
int main() { return 0; }
void thing() {
        // yeah
}

@cbarrete
Copy link
Contributor Author

Main thing: uqueryctx.inputs() gives you a file_set, which if you look in the docs, is just an iterable set of bxl.FileNode. These aren't artifacts, and you're just collecting their paths as strings and writing them into a file, so you can't expect Buck to track them. You need to turn them into source artifacts using ctx.fs.source(node) before you can use them as such.

Fantatic! I understood the root cause but had not found this ctx.fs.source API, thank you so much for pointing it out!

As to your other points:

  1. I have not noticed this so far, all my inputs have always been materialized properly, but perhaps my use cases were just simple enough, or some other action had materialized it before hands.
  2. Out of curiosity, what do you typically use allow_args for? I haven't had a use case for it (or at least I didn't realize that it would have been a convenient feature).
  3. Of course, I just added that to "prove" that it wasn't something wrong with my inotify setup or something similar.
  4. I suppose this answers my question for 2), doesn't it? :) This is a very good point, I hadn't considered that, thanks again for pointing that out.
  5. Agreed, I just wanted to get something simple working first, but I do want to use ensure_multiple both for caching and parallelism.

@cormacrelf
Copy link
Contributor

For (2), it is used all over the prelude, like when putting together rustc argfiles. Those scripts are linked to a specific compile step, and only need rewriting when the dependency lists / rust flags change. That involves writing many flags conditionally, so it's useful for Buck to track which ones were actually used. I think it's just slightly less often useful in BXL, but you might do it for (5). The main benefit of getting dependencies written into the script itself is that you can open that file and see what buck is doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants