-
Notifications
You must be signed in to change notification settings - Fork 937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix stale BSP diagnostics #7610
Conversation
Thanks for opening this PR. Is it ready to be reviewed? Do you need any help? |
I think I made it clear that I need help here. This fixes a bug (see linked demo project). But I'm not sure I'm fixing the root cause. Also, no matter whether this is already the right fix or not, I wasn't able to write a negative test. So I need input on the two points I left here already as checklist: I'm not sure whether this fix just heals symptoms or is actually the right one.
Also no matter what, I need a way to reproduce the bug that can be seen on my demo project (and some other projects of my) in the SBT tests in this codebase. But the bug doesn't manifest on the test projects in this repo. I don't know why. I thought it always happens with Java files (that was my observation in some other projects) but this doesn't seem to be true. |
Maybe this is difficult to understand for anybody who didn't look at log files for a few days like me, so I try to rephrase the main questions again: SBT accumulates published compiler diagnostics. This accumulation happens through a Map, keyed by the path to the affected file. (Please see the file this commit updates). But the path can have two distinct forms (even it's a handle to the same file in the end). It can be an absolute path, or it can be a relative path prefixed by something like The assumption that causes this bug is that the path has always the same form in the calls to The different path specifier manifested for me just on Java files. But that's not the only condition it seems, as it does not happen in the project used for testing SBT itself (see my attempt to write a regression test for this commit). I'm not sure what the trigger could be. My repro project is absolutely minimal. (Hmm… Maybe that's actually (part of) the trigger: I have no projects / modules defined explicitly in the build… Need to check this. Still I would not know why it can make the form of paths returned by the compiler different; maybe they come in already different; but I wasn't able to trace that as there is just too much indirection; the stacktrace is half a mile long and goes through SBT, zinc, compiler bridge, the Scala 2 compiler, and hell knows where the path gets actually injected, as this is an async action). There is logic to handle both path forms in the compiler bridge. So it is expected to have both forms in general. But whether both forms are actually valid as the input to Also it's unclear to me how having a simple module-less project, or project modules respectively, could affect the form of paths to source files. Why it's not always of the form If nobody actually knows what's going on here in fact we can just merge the fix, leave out the test, and call it a day. It will fix a bug. But maybe this bug is just a symptom of something else? But if nobody knows how things are supposed to actually work having a bugfix is better than not having one at all… BTW: The bug is in the category of a full show stopper. Just go and try my repro project! If you run into it it's almost impossible to edit a Java file in a Scala project. Metals will hang quite quickly, and all errors, including syntax errors as you type, will just pile up and never go away; red squiggles everywhere after a few keystrokes that never go away until you restart Metals. It's supper annoying and makes for the most terrible first impression possible. So I would really like to have a bugfix for it. But I also usually try to fix root causes, and not just heal symptoms. Because just healing symptoms makes things worse in the long run. If you hidden all your bugs behind some "quick fixes" this will fire back sooner or later in bad ways. So I would really like to understand what's happening here before proceeding. But I just don't know enough about how SBT is supposed to work to decide whether this path here is the right one or some deeper digging is still required. I published the patch in the hope that for some people more knowledgeable about SBT than me some bell will ring when reading the description. But actually I had the impression my input is getting ignored instead… |
@SlowBrainDude thanks a lot for reporting this issue and proposing a fix for it. Indeed it seems that the root cause of the issue is the Java compiler reporting errors in plain source files (absolute path) instead of virtual source files ( After further investigation in your reproduction repo, I found that the issue disappear when I remove the More specifically it seems that the All of that being said, it seems that your changes make the My suggestion is to add the addSbtPlugin("org.scalameta" % "sbt-metals" % "1.2.2") The |
Thank you very much @adpi2 for figuring out the part I did not manage to understand!!! 👍 Now I have time for that again; I'll try to update this PR soonish, applying your suggestions. But I think I will have a closer look at the Metals plugin before doing that. I think it should not change the semantics of a build, this seems pretty wrong. (That's already the second time that this is the root cause of something. I reported another issue here that turned out to be caused by the Metals plugin just shortly before, see #7608). In case the problematic part can be fixed in Metals, would you still suggest to include the changes here? I just don't know enough about SBT internals to judge whether this here is something that could happen also for other—valid—reasons, or something that should never happen at all. In case it should never happen I think this "fix" here would just hide other potential errors, and I would actually omit it. (I'm a big proponent of "fail early"; if something unexpected happens it should imho instantly explode, loudly. Imho such things should never be "painted over" by some workarounds. But opinions may vary of course. Some people think it's better to keep running no matter what). I've spend three days digging first though Metals and than through SBT (which I did not do ever before, so at least this part was not a wast of time) to figure out this here. But if fixing the right thing means digging even deeper so it be. 🙂 |
BTW, how did you find out that it's the Metals plugin, @adpi2? Is this already a kind of "usual suspect"? (Because for the other issue eed3si9n also was quick to diagnose an issue with the plugin.) Or do you have some "secret debugging techniques" I should be aware of? Happy to learn some new tricks! 😄 |
It took me some time, but to make it short:
No I wasn't expecting that. For #7608 though, it's different because we know that Metals needs semanticdb and enables it by default.
Metals is clearly not the problem here:
Yes I think we should still include the changes here. The |
Thanks for the detailed information! For the first part I was doing the same. It's the usual thing to do. Just publish local SNAPSHOTs and run from that, adding logs and stuff.
Oh, yeah, that puzzled me also massively. I've gave up after half a day. Had no idea left where to look. For the second part I see you just smelled it… 😄 I know that Metals is adding a SBT plugin, but I would have not suspected it to change anything about the build. This still seems wrong.
Thanks for pointing that out. So I think I don't need to look there. Great!
Let's see what I find out.
OK. So I'll make this PR mergable. (Fix the test, include the requested changes) To be honest, I also think this whole diagnostics reporter thingy is very brittle, and a quite bad design. Streaming diagnostics is just such an ill idea. (Not sure though this isn't already messed up in the BSP "protocol"…) We're talking here about something that takes just a few milliseconds. "Optimizing" that with a very complex streaming mechanic seems wrong. Eclipse (the "RedHat" Java plugin) does it much smarter, and just locks the editor for this short time and submits all diagnostics for a file as batch. (I didn't know about the editor looking until at some point under very high load I've got once some "loading indicator" in the editor when typing instead of chars appearing on screen. But usually you never run into that). Additionally I think the whole idea of "snapshots" and diffing them implemented in Metals is also ill. Actually I think this is the cause of a massive performance degradation because the algo is likely very high complexity, and explodes with longer files and "many" errors. (I don't have formal prove of this suspicion right now. But I think I will come up with some. But that's something for the Metals project, as that part of the issue is not SBT's problem.) Thanks again for taking time to look into that. I'm sorry I could not come up with a full solution in the first place. |
1d6b747
to
0e9bde1
Compare
I've investigated the questionable things and made a small update to the PR. It's now ready for review. (Sorry for the long delay!) First of all I think it makes no sense to try to change the behavior around what happens when you set Having two ways to call the Java compiler is unfortunate, has some side effects (like different compiler diagnostic formats—see the updated test), but looks all in all like the better trade-off to introducing quite a lot of complexity. (There is also another small, likely cosmetic difference, which is the different implementations of There is also an additional small change in the diagnostics reporter. The bug I've fixed here masked another bug: In case a Java file had errors already when you compiled it for the first time the published diagnostics would have been overridden with empty ones again as Regarding the other things: I did not pick up the idea to add the Metals plugin to the build. Instead I'm testing with the root cause, which is the I have a question though: Did I introduce some kind of race condition by adding a second Besides that I kept the test separate to the others. It's already one of the biggest test on its own, and as even the messages you need to check are different there would be imho nothing to win by merging it into the other test. Having this separate also makes it easier to see whether something failed while using the JVM compiler service, or respectively the compiler called as regular system process. So summarized, in case you don't see any issue with adding a second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for this one suggestion, it looks good to me.
The BSP server didn't reset old diagnostic messages sent to BSP clients under certain circumstances. This commit mitigates this edge case and ensures that diagnostics for files that previously had compilation problems are properly reset when fresh diagnostics messages are sent. The culprit was a mismatch of map keys: Files with problems were sometimes recorded under an absolute path, but later attempted to be retrieved by virtual path.
0e9bde1
to
f6663a0
Compare
) | ||
|
||
IO.delete(otherBuildFile) | ||
reloadWorkspace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an explicit ()
to return Unit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the fix #7638
The BSP server didn't reset old diagnostic messages sent to BSP clients under certain circumstances. This commit mitigates this edge case and ensures that diagnostics for files that previously had compilation problems are properly reset when fresh diagnostics messages are sent.
The culprit was a mismatch of map keys: Files with problems were sometimes recorded under an absolute path, but later attempted to be retrieved by virtual path.
That the fix works can be tested end-to-end on a repro project under:
https://github.com/SlowBrainDude/sbt-bsp-bug-reproducer
Clone the repo and open it in VSCode with Metals installed. In case Metals isn't configured to do automatically so, switch to the SBT build server. (Bloop doesn't show any issues with the project). Let Metals import the project and do the initial compilation. Everything should work fine so far. Now navigate to the
JavaClass
file. There is a commented block of code. Delete the*
as the comment says. SBT will send diagnostics for the now failing build and you'll get a lot of red squiggles in the Java file after compile finishes. Now restore the comment. The errors reported by the SBT BSP will still be shown in the file (and in the problems pane and file explorer). By commenting things in and out you can actually multiply the errors. You can create a few hundred or even more just by commenting things in and out. The reported errors will never go away even when the file compiles fine again.With my fixed version of SBT the errors get reset properly again and don't get stale.
(Even Metals does some questionable things that may cause sending the reset diagnostic to take quite some time. I've seen compile times of 5 - 15 ms on the Java file, but it could take over 1 minute until the reset diagnostics got sent. But that's a Metals issue which implements some algo that explodes with line count and error count in a file quite quickly, maybe even exponential, at least high polynomial, given that this issue can make a 2 second CLI compile of very large file with only one syntax error into 20 minutes of all CPUs grilling on max when running Metals).
So far so good, but this PR is just a draft as some things aren't clear to me still.
Maybe I've hunted down somethings that shouldn't exist in the first place, and the real bug is even deeper. Because the bug doesn't surface in all cases actually.
I've tried to write a test for the case affected by this code change. But I've failed in doing so; at least so far.
On my repro project the
problem.position.sourcePath
extracted from theProblem
passed topublishDiagnostic
is definitively an absolute path. I had log output more or less around every line of code inBuildServerReporter
and traced the whole BSP communication while debugging. (It took actually quite some time to realize that the cause of the problem is a harmless looking but failinggetOrElse
… And it took even longer to see the difference in file path, as I only looked at the file name at first without paying much attention to what is written before it; also the log output was far apart so I just didn't see the difference between absolute path and something that starts with${BASE}/
for a long time).But the test I've written didn't fail in such way. I was wondering why the test gets green no matter my fix is applied or not. It turns out that in the test project
problem.position.sourcePath
is a "virtual path" (with a${BASE}/
prefix), even for Java files which I thought at first was actually the trigger condition for the bug.Now I need help to understand why my repro project behaves differently from the test projects in SBT. I've tried to trace how
Problems
s get constructed, and where the source path is initialized, but honestly I wasn't able to look though all this indirection. There is logic in DiagnosticsReporter but I don't understand where and why the source is sometimes feed in as proper file and sometimes as virtual file (which may have a possibly "virtual" path).The main question is: Why do the "file descriptors" feed into
sendFailureReport
andpublishDiagnostic
differ sometimes in how the path looks like (even they are the same file in the end). Is this something that's expected, or is the actual bug that this differs? Should thesourcePath
extracted fromProblems
always be a "virtual" one maybe, and is it already wrong that some absolute path ends up there?In case it's expected that the form of the path may differ between
sendFailureReport
andpublishDiagnostic
I still need a way to write a test that exhibits this behavior, to have a proper negative test (one that fails without the patch applied).Sorry for the very long description, but the issue wasn't easy to find and understand either. I hope I described it here in a way that can be understood.
TASKS to make this mergable: