How do I make a BXL script depend on file contents? #748

cbarrete · 2024-08-23T19:55:08Z

I'm writing a BXL script for linting C++ files with clang-tidy. The script itself is very simple:

def _clang_tidy_impl(ctx: BxlContext):
    actions = ctx.bxl_actions().actions

    # Each target expression expands to a list of targets, so we flatten the
    # input into a flat list of targets.
    targets = []
    for t in ctx.cli_args.targets:
        targets += t

    files = ctx.uquery().inputs(targets)
    cpp_files = []
    for file in files:
        path = file.path
        if path.endswith(".cpp") or path.endswith(".c") or path.endswith(".cc") or path.endswith(".cxx"):
            cpp_files.append(path)

    clang_tidy = ctx.analysis(":clang-tidy").providers()[RunInfo]
    report_file = actions.declare_output("clang_tidy_report.yaml")
    script = actions.write(
        "run_clang_tidy.sh",
        [
            "set -eu",
            cmd_args(
                clang_tidy,
                cpp_files,
                "--export-fixes",
                report_file.as_output(),
                delimiter = " ",
            ),
            # If there are no warnings to report, clang-tidy will not produce an output.
            cmd_args(
                "touch",
                report_file.as_output(),
                delimiter = " ",
            ),
        ],
        with_inputs = True,
        is_executable = True,
    )

    actions.run(
        cmd_args(["/bin/sh", script]),
        category = "clang_tidy",
    )
    ctx.output.ensure(report_file)

# buck2 bxl clang-tidy.bxl:clang_tidy -c build.execution_platforms=some:platform -- --targets demo/...
clang_tidy = bxl_main(
    impl = _clang_tidy_impl,
    cli_args = {
        "targets": cli_args.list(cli_args.target_expr()),
    },
)

(I left out some irrelevant details)

This works well enough as a prototype, except that successes are cached even if I edit a source file and add an error on purpose. I get why this is happening: the build graph doesn't change, so my uquery is cached, so the whole script is cached.

I've tried to add dependencies wherever I could via cmd_args' hidden parameter, but no dice (heh). Buck nags me with a

[2024-08-23T15:34:15.608-04:00] File changed: root//demo/lib.h

but won't let me easily depend on that change.

Is there a way that I can someone add a dependency on the contents of the input files that doesn't involve e.g. reading them via a dynamic action and some hashing?

The text was updated successfully, but these errors were encountered:

cormacrelf · 2024-08-31T15:01:16Z

Main thing: uqueryctx.inputs() gives you a file_set, which if you look in the docs, is just an iterable set of bxl.FileNode. These aren't artifacts, and you're just collecting their paths as strings and writing them into a file, so you can't expect Buck to track them. You need to turn them into source artifacts using ctx.fs.source(node) before you can use them as such.

More things:

Seems that BXL's actions.write does not really support with_inputs = True, even though it has the flag, it doesn't do anything.
BXL also doesn't seem to really implement allow_args = True, even though it has the flag and is documented. The returned list of input artifacts always seems to be empty.
Be aware that buck will print File changed: ... for literally any file that changes, including ones that are not referenced anywhere. Doesn't really mean anything.
This is a bit in the weeds, but generally I wouldn't write the list of files into the script itself. That means buck needs to write a new copy of the script each time the file list changes, i.e. every time you try to clang-tidy a different target. Probably just pass em as args to the script.
And since this is clang-tidy, you may want to do one actions.run per target, so you can configure clang-tidy with include paths from the target in question, for example. Perhaps this is what you have omitted for simplicity. But worth pointing out you probably want ctx.output.ensure_multiple if you're doing that.

I am running a bit of an old Buck2. Maybe those first two BXL issues have been fixed since.

All in all, this works:

def _cat_files_impls(ctx: BxlContext):
    actions = ctx.bxl_actions().actions

    # Each target expression expands to a list of targets, so we flatten the
    # input into a flat list of targets.
    targets = []
    for t in ctx.cli_args.targets:
        targets += t

    # The inputs() returns a file_set, which is a set of bxl.FileNode.
    # We need these as source artifacts.
    files = [ctx.fs.source(file) for file in ctx.uquery().inputs(targets)]

    report_file = actions.declare_output("all.txt")
    script = actions.write(
        "cat_files.sh",
        [
            "OUTPUT=\"$1\"",
            "shift",
            'cat "$@" > "$OUTPUT"',
        ],
        is_executable = True,
    )

    actions.run(
        cmd_args("/bin/sh", script, report_file.as_output(), files),
        category = "cat_files",
    )
    ensured = ctx.output.ensure(report_file)
    ctx.output.print(ensured)

# buck2 bxl cat_files.bxl:cat_files -- --targets thing//:main | xargs cat
cat_files = bxl_main(
    impl = _cat_files_impls,
    cli_args = {
        "targets": cli_args.list(cli_args.target_expr()),
    },
)

; buck2 bxl //thing/cat_files.bxl:cat_files -- --targets //thing:main | xargs cat
...
int main() { return 0; }
void thing() {}

; buck2 bxl //thing/cat_files.bxl:cat_files -- --targets //thing:main | xargs cat
File changed: root//thing/thing.c
...
BXL SUCCEEDED
int main() { return 0; }
void thing() {
        // yeah
}

cbarrete · 2024-08-31T20:01:35Z

Main thing: uqueryctx.inputs() gives you a file_set, which if you look in the docs, is just an iterable set of bxl.FileNode. These aren't artifacts, and you're just collecting their paths as strings and writing them into a file, so you can't expect Buck to track them. You need to turn them into source artifacts using ctx.fs.source(node) before you can use them as such.

Fantatic! I understood the root cause but had not found this ctx.fs.source API, thank you so much for pointing it out!

As to your other points:

I have not noticed this so far, all my inputs have always been materialized properly, but perhaps my use cases were just simple enough, or some other action had materialized it before hands.
Out of curiosity, what do you typically use allow_args for? I haven't had a use case for it (or at least I didn't realize that it would have been a convenient feature).
Of course, I just added that to "prove" that it wasn't something wrong with my inotify setup or something similar.
I suppose this answers my question for 2), doesn't it? :) This is a very good point, I hadn't considered that, thanks again for pointing that out.
Agreed, I just wanted to get something simple working first, but I do want to use ensure_multiple both for caching and parallelism.

cormacrelf · 2024-09-01T01:19:16Z

For (2), it is used all over the prelude, like when putting together rustc argfiles. Those scripts are linked to a specific compile step, and only need rewriting when the dependency lists / rust flags change. That involves writing many flags conditionally, so it's useful for Buck to track which ones were actually used. I think it's just slightly less often useful in BXL, but you might do it for (5). The main benefit of getting dependencies written into the script itself is that you can open that file and see what buck is doing.

cbarrete mentioned this issue Aug 29, 2024

Linting #263

Open

cbarrete closed this as completed Aug 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I make a BXL script depend on file contents? #748

How do I make a BXL script depend on file contents? #748

cbarrete commented Aug 23, 2024

cormacrelf commented Aug 31, 2024

cbarrete commented Aug 31, 2024

cormacrelf commented Sep 1, 2024

How do I make a BXL script depend on file contents? #748

How do I make a BXL script depend on file contents? #748

Comments

cbarrete commented Aug 23, 2024

cormacrelf commented Aug 31, 2024

cbarrete commented Aug 31, 2024

cormacrelf commented Sep 1, 2024