Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stale BSP diagnostics #7610

Merged
merged 1 commit into from
Sep 2, 2024
Merged

Conversation

SlowBrainDude
Copy link
Contributor

@SlowBrainDude SlowBrainDude commented Jul 24, 2024

The BSP server didn't reset old diagnostic messages sent to BSP clients under certain circumstances. This commit mitigates this edge case and ensures that diagnostics for files that previously had compilation problems are properly reset when fresh diagnostics messages are sent.

The culprit was a mismatch of map keys: Files with problems were sometimes recorded under an absolute path, but later attempted to be retrieved by virtual path.

That the fix works can be tested end-to-end on a repro project under:
https://github.com/SlowBrainDude/sbt-bsp-bug-reproducer

Clone the repo and open it in VSCode with Metals installed. In case Metals isn't configured to do automatically so, switch to the SBT build server. (Bloop doesn't show any issues with the project). Let Metals import the project and do the initial compilation. Everything should work fine so far. Now navigate to the JavaClass file. There is a commented block of code. Delete the * as the comment says. SBT will send diagnostics for the now failing build and you'll get a lot of red squiggles in the Java file after compile finishes. Now restore the comment. The errors reported by the SBT BSP will still be shown in the file (and in the problems pane and file explorer). By commenting things in and out you can actually multiply the errors. You can create a few hundred or even more just by commenting things in and out. The reported errors will never go away even when the file compiles fine again.

With my fixed version of SBT the errors get reset properly again and don't get stale.

(Even Metals does some questionable things that may cause sending the reset diagnostic to take quite some time. I've seen compile times of 5 - 15 ms on the Java file, but it could take over 1 minute until the reset diagnostics got sent. But that's a Metals issue which implements some algo that explodes with line count and error count in a file quite quickly, maybe even exponential, at least high polynomial, given that this issue can make a 2 second CLI compile of very large file with only one syntax error into 20 minutes of all CPUs grilling on max when running Metals).

So far so good, but this PR is just a draft as some things aren't clear to me still.

Maybe I've hunted down somethings that shouldn't exist in the first place, and the real bug is even deeper. Because the bug doesn't surface in all cases actually.

I've tried to write a test for the case affected by this code change. But I've failed in doing so; at least so far.

On my repro project the problem.position.sourcePath extracted from the Problem passed to publishDiagnostic is definitively an absolute path. I had log output more or less around every line of code in BuildServerReporter and traced the whole BSP communication while debugging. (It took actually quite some time to realize that the cause of the problem is a harmless looking but failing getOrElse… And it took even longer to see the difference in file path, as I only looked at the file name at first without paying much attention to what is written before it; also the log output was far apart so I just didn't see the difference between absolute path and something that starts with ${BASE}/ for a long time).

But the test I've written didn't fail in such way. I was wondering why the test gets green no matter my fix is applied or not. It turns out that in the test project problem.position.sourcePath is a "virtual path" (with a ${BASE}/ prefix), even for Java files which I thought at first was actually the trigger condition for the bug.

Now I need help to understand why my repro project behaves differently from the test projects in SBT. I've tried to trace how Problemss get constructed, and where the source path is initialized, but honestly I wasn't able to look though all this indirection. There is logic in DiagnosticsReporter but I don't understand where and why the source is sometimes feed in as proper file and sometimes as virtual file (which may have a possibly "virtual" path).

The main question is: Why do the "file descriptors" feed into sendFailureReport and publishDiagnostic differ sometimes in how the path looks like (even they are the same file in the end). Is this something that's expected, or is the actual bug that this differs? Should the sourcePath extracted from Problems always be a "virtual" one maybe, and is it already wrong that some absolute path ends up there?

In case it's expected that the form of the path may differ between sendFailureReport and publishDiagnostic I still need a way to write a test that exhibits this behavior, to have a proper negative test (one that fails without the patch applied).

Sorry for the very long description, but the issue wasn't easy to find and understand either. I hope I described it here in a way that can be understood.

TASKS to make this mergable:

  • Investigate whether this is actually the proper fix, and not just "healing symptoms"
  • Fix the regression test, or write a proper one in case this is not the right fix

@adpi2
Copy link
Member

adpi2 commented Jul 31, 2024

Thanks for opening this PR. Is it ready to be reviewed? Do you need any help?

@SlowBrainDude
Copy link
Contributor Author

SlowBrainDude commented Aug 1, 2024

I think I made it clear that I need help here.

This fixes a bug (see linked demo project). But I'm not sure I'm fixing the root cause.

Also, no matter whether this is already the right fix or not, I wasn't able to write a negative test.

So I need input on the two points I left here already as checklist:

I'm not sure whether this fix just heals symptoms or is actually the right one.

The main question is: Why do the "file descriptors" feed into sendFailureReport and publishDiagnostic differ sometimes in how the path looks like (even they are the same file in the end). Is this something that's expected, or is the actual bug that this differs? Should the sourcePath extracted from Problems always be a "virtual" one maybe, and is it already wrong that some absolute path ends up there?

Also no matter what, I need a way to reproduce the bug that can be seen on my demo project (and some other projects of my) in the SBT tests in this codebase. But the bug doesn't manifest on the test projects in this repo. I don't know why. I thought it always happens with Java files (that was my observation in some other projects) but this doesn't seem to be true.

@SlowBrainDude
Copy link
Contributor Author

Maybe this is difficult to understand for anybody who didn't look at log files for a few days like me, so I try to rephrase the main questions again:

SBT accumulates published compiler diagnostics. This accumulation happens through a Map, keyed by the path to the affected file. (Please see the file this commit updates). But the path can have two distinct forms (even it's a handle to the same file in the end). It can be an absolute path, or it can be a relative path prefixed by something like ${BASE}/.

The assumption that causes this bug is that the path has always the same form in the calls to sendFailureReport and publishDiagnostic (so the key into said accumulating Map be the same). But this assumption is wrong. It has sometimes a different form (even it references the same file in the end) and lookup by such key in said Map fails then (even it should not as there are entries already for the file in question, just with a different form of path specifier as key).

The different path specifier manifested for me just on Java files. But that's not the only condition it seems, as it does not happen in the project used for testing SBT itself (see my attempt to write a regression test for this commit). I'm not sure what the trigger could be. My repro project is absolutely minimal. (Hmm… Maybe that's actually (part of) the trigger: I have no projects / modules defined explicitly in the build… Need to check this. Still I would not know why it can make the form of paths returned by the compiler different; maybe they come in already different; but I wasn't able to trace that as there is just too much indirection; the stacktrace is half a mile long and goes through SBT, zinc, compiler bridge, the Scala 2 compiler, and hell knows where the path gets actually injected, as this is an async action).

There is logic to handle both path forms in the compiler bridge. So it is expected to have both forms in general. But whether both forms are actually valid as the input to sendFailureReport and publishDiagnostic is still unclear to me. (Whoever written the code in the first place assumed it's always of the same form. I can't judge whether this was an oversight or is a valid assumption!) This could be a pointer to a bug sitting somewhere deeper. But I don't know enough about SBT to assume anything here. That's why I left an draft and hopped someone with more knowledge about the inner workings of SBT can clarify this part.

Also it's unclear to me how having a simple module-less project, or project modules respectively, could affect the form of paths to source files. Why it's not always of the form ${BASE}/relative/path/to/File.java, but sometimes an absolute path? Is the "root module" somehow different to other modules? Should it be like that actually?

If nobody actually knows what's going on here in fact we can just merge the fix, leave out the test, and call it a day. It will fix a bug.

But maybe this bug is just a symptom of something else? But if nobody knows how things are supposed to actually work having a bugfix is better than not having one at all…

BTW: The bug is in the category of a full show stopper. Just go and try my repro project! If you run into it it's almost impossible to edit a Java file in a Scala project. Metals will hang quite quickly, and all errors, including syntax errors as you type, will just pile up and never go away; red squiggles everywhere after a few keystrokes that never go away until you restart Metals. It's supper annoying and makes for the most terrible first impression possible. So I would really like to have a bugfix for it. But I also usually try to fix root causes, and not just heal symptoms. Because just healing symptoms makes things worse in the long run. If you hidden all your bugs behind some "quick fixes" this will fire back sooner or later in bad ways. So I would really like to understand what's happening here before proceeding. But I just don't know enough about how SBT is supposed to work to decide whether this path here is the right one or some deeper digging is still required. I published the patch in the hope that for some people more knowledgeable about SBT than me some bell will ring when reading the description. But actually I had the impression my input is getting ignored instead…

@adpi2
Copy link
Member

adpi2 commented Aug 7, 2024

@SlowBrainDude thanks a lot for reporting this issue and proposing a fix for it.

Indeed it seems that the root cause of the issue is the Java compiler reporting errors in plain source files (absolute path) instead of virtual source files (${BASE}/.../Hello.java). But this does not seem to be always the case and that's why you are struggling to reproduce it in the tests.

After further investigation in your reproduction repo, I found that the issue disappear when I remove the sbt-metals plugin, that is automatically added by Metals. Meaning the Java compilation errors are suddenly reported in virtual files (${BASE}/.../Hello.java) which is what is expected by the BuildServerReporter.

More specifically it seems that the javaHome setting that is set here triggers the bug. javaHome is None by default, but as soon as it gets a value (Some(File())) the errors are reported in plain files. I don't know if that's a bug in itself: maybe it is hard to get an arbitrary Java compiler to compile virtual files (and I don't really want to spend time figuring this out).

All of that being said, it seems that your changes make the BuildServerReporter more robust. Assuming that sourcePath: String is always the id of a VirtualFile was somewhat hazardous. Making the round-trip converter.toPath(VirtualFileRef.of(id)) seems safer.

My suggestion is to add the sbt-metals plugin in the buildserver test project, and that should reproduce the bug:

addSbtPlugin("org.scalameta" % "sbt-metals" % "1.2.2")

The buildserver test is kind of an integration test, and so it seems reasonable to test the integration with Metals.

@SlowBrainDude
Copy link
Contributor Author

Thank you very much @adpi2 for figuring out the part I did not manage to understand!!! 👍

Now I have time for that again; I'll try to update this PR soonish, applying your suggestions.

But I think I will have a closer look at the Metals plugin before doing that. I think it should not change the semantics of a build, this seems pretty wrong. (That's already the second time that this is the root cause of something. I reported another issue here that turned out to be caused by the Metals plugin just shortly before, see #7608).

In case the problematic part can be fixed in Metals, would you still suggest to include the changes here?

I just don't know enough about SBT internals to judge whether this here is something that could happen also for other—valid—reasons, or something that should never happen at all. In case it should never happen I think this "fix" here would just hide other potential errors, and I would actually omit it. (I'm a big proponent of "fail early"; if something unexpected happens it should imho instantly explode, loudly. Imho such things should never be "painted over" by some workarounds. But opinions may vary of course. Some people think it's better to keep running no matter what).

I've spend three days digging first though Metals and than through SBT (which I did not do ever before, so at least this part was not a wast of time) to figure out this here. But if fixing the right thing means digging even deeper so it be. 🙂

@SlowBrainDude
Copy link
Contributor Author

BTW, how did you find out that it's the Metals plugin, @adpi2?

Is this already a kind of "usual suspect"? (Because for the other issue eed3si9n also was quick to diagnose an issue with the plugin.)

Or do you have some "secret debugging techniques" I should be aware of? Happy to learn some new tricks! 😄

@adpi2
Copy link
Member

adpi2 commented Aug 9, 2024

BTW, how did you find out that it's the Metals plugin, @adpi2?

It took me some time, but to make it short:

  • I added some logs in BuildServerReporter and published a SNAPSHOT version of sbt locally.
  • I copied the buildserver test to a new folder and configured project/build.properties to use the SNAPSHOT version of sbt
  • I imported it in Metals and saw that I could reproduce the bug, which puzzled me. How was it possible that the bug occurred when importing the build but not in the test?
  • I remembered that Metals injects some different settings in project/metals.sbt file. I commented out the sbt-metals plugin and saw that it made the bug disappear.
  • I then tried the setting in MetalsPlugin one by one to find out that javaHome was triggering the bug.

Is this already a kind of "usual suspect"?

No I wasn't expecting that. For #7608 though, it's different because we know that Metals needs semanticdb and enables it by default.

But I think I will have a closer look at the Metals plugin before doing that.

Metals is clearly not the problem here: javaHome is a valid setting and we should be able to configure it. The problem is in sbt itself: why does setting the javaHome change the path of reported errors? It would be useful to investigate that but I don't know if it is easy to fix or not. Maybe we cannot ask an arbitrary Java compiler (from a custom javaHome) to compile virtual sources, because it may not support the feature. As you said, it could be a valid reason.

In case the problematic part can be fixed in Metals, would you still suggest to include the changes here?

Yes I think we should still include the changes here. The BuildServerReporter looks quite fragile to me and I would take anything to make it more robust.

@SlowBrainDude
Copy link
Contributor Author

SlowBrainDude commented Aug 9, 2024

Thanks for the detailed information!

For the first part I was doing the same. It's the usual thing to do. Just publish local SNAPSHOTs and run from that, adding logs and stuff.

How was it possible that the bug occurred when importing the build but not in the test?

Oh, yeah, that puzzled me also massively. I've gave up after half a day. Had no idea left where to look.

For the second part I see you just smelled it… 😄

I know that Metals is adding a SBT plugin, but I would have not suspected it to change anything about the build. This still seems wrong.

Metals is clearly not the problem here

Thanks for pointing that out. So I think I don't need to look there. Great!

The problem is in sbt itself: why does setting the javaHome change the path of reported errors? It would be useful to investigate that

Let's see what I find out.

Yes I think we should still include the changes here. The BuildServerReporter looks quite fragile to me and I would take anything to make it more robust.

OK. So I'll make this PR mergable. (Fix the test, include the requested changes)

To be honest, I also think this whole diagnostics reporter thingy is very brittle, and a quite bad design. Streaming diagnostics is just such an ill idea. (Not sure though this isn't already messed up in the BSP "protocol"…) We're talking here about something that takes just a few milliseconds. "Optimizing" that with a very complex streaming mechanic seems wrong. Eclipse (the "RedHat" Java plugin) does it much smarter, and just locks the editor for this short time and submits all diagnostics for a file as batch. (I didn't know about the editor looking until at some point under very high load I've got once some "loading indicator" in the editor when typing instead of chars appearing on screen. But usually you never run into that). Additionally I think the whole idea of "snapshots" and diffing them implemented in Metals is also ill. Actually I think this is the cause of a massive performance degradation because the algo is likely very high complexity, and explodes with longer files and "many" errors. (I don't have formal prove of this suspicion right now. But I think I will come up with some. But that's something for the Metals project, as that part of the issue is not SBT's problem.)

Thanks again for taking time to look into that. I'm sorry I could not come up with a full solution in the first place.

@SlowBrainDude SlowBrainDude marked this pull request as ready for review August 30, 2024 13:16
@SlowBrainDude
Copy link
Contributor Author

SlowBrainDude commented Aug 30, 2024

I've investigated the questionable things and made a small update to the PR. It's now ready for review. (Sorry for the long delay!)

First of all I think it makes no sense to try to change the behavior around what happens when you set javaHome. The whole point of this setting is to use an external compiler. Now aligning the inner workings of it with the programmatically called compiler available in the JVM process SBT runs in would make things quite complex. You would need to change simply shelling out to a system process with something like RMI. I see no convincing reason to even try. (A change the other way around, shelling out in any case, and not using the compiler service programmatically would be even a bigger change. I don't see this happening. That would touch the core of SBT and looks very risky to me).

Having two ways to call the Java compiler is unfortunate, has some side effects (like different compiler diagnostic formats—see the updated test), but looks all in all like the better trade-off to introducing quite a lot of complexity. (There is also another small, likely cosmetic difference, which is the different implementations of XPosition, which has different .toString implementations. This does not show up in the test, but I've seen it in debug logs; no clue whether this can cause trouble ever, though. I think at worst the displayed messages in the error pane of an IDE will look slightly different. Didn't investigate this further).

There is also an additional small change in the diagnostics reporter. The bug I've fixed here masked another bug: In case a Java file had errors already when you compiled it for the first time the published diagnostics would have been overridden with empty ones again as notifyFirstReport() was not called when sending failure reports for the first time. The updates test cases revealed that. (Note that notifyFirstReport() needs to be called in two places. I've tried to put it further down the call stack and call it only once but this broke other behavior… The diagnostics reporter is quite brittle, as mentioned already).

Regarding the other things:

I did not pick up the idea to add the Metals plugin to the build. Instead I'm testing with the root cause, which is the javaHome setting. Adding the Metals plugin would add a maintenance burden as it would require constant updates of the plugin just to keep tests in shape.

I have a question though: Did I introduce some kind of race condition by adding a second .sbt file to the test project? I mean, it gets picked up, otherwise the test would fail (alone for the different diagnostics message formats), but will this work in any case? I was not able to find a definite answer to what SBT does when there are more than one .sbt files in a project root directory.

Besides that I kept the test separate to the others. It's already one of the biggest test on its own, and as even the messages you need to check are different there would be imho nothing to win by merging it into the other test. Having this separate also makes it easier to see whether something failed while using the JVM compiler service, or respectively the compiler called as regular system process.

So summarized, in case you don't see any issue with adding a second .sbt file to the test project, this PR is ready from my viewpoint. But I'm of course open to further suggestions or change request.

Copy link
Member

@adpi2 adpi2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for this one suggestion, it looks good to me.

The BSP server didn't reset old diagnostic messages sent to BSP clients under
certain circumstances. This commit mitigates this edge case and ensures that
diagnostics for files that previously had compilation problems are properly
reset when fresh diagnostics messages are sent.

The culprit was a mismatch of map keys: Files with problems were sometimes recorded
under an absolute path, but later attempted to be retrieved by virtual path.
@adpi2 adpi2 merged commit caf7e04 into sbt:1.10.x Sep 2, 2024
9 of 10 checks passed
@SlowBrainDude SlowBrainDude mentioned this pull request Sep 2, 2024
)

IO.delete(otherBuildFile)
reloadWorkspace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be an explicit () to return Unit here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the fix #7638

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants