-
Notifications
You must be signed in to change notification settings - Fork 955
[2.x] Refactor Settings and optimize indexing of aggregate keys
#7879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is pretty cool. |
|
Yes I can. But if I separate them, the elapsed real time is getting slightly worse. Because the new
Will do
I assumed it was mostly internal. I'll check that with Github search. |
Settings0 used to be a Map[Scope, AttributeMap], and is now a Map[ScopedKey[x], x]. This is better because we don't need to decompose all ScopedKey[x] into a Scope and an AttributeKey[x], for recomposing it back later, which duplicates all ScopedKey[x]. It reduces the number of long-living ScopedKey[x] by 8%, and the total number of instances by 1.4%. Also it improves the performance of Settings0, which was responsible of 2.95% of the total CPU time, and is now responsible of 0.41%.
To build the index of all aggregate keys, we were computing the reverse aggregation of each key, before indexing them. And so each aggregate key was indexed many times, once for each aggregated project. It was parallelized to reduce the latency. In this PR, we compute the reverse aggregation of all the keys at once, removing all duplication. We cannot parallelize this process anymore but we don't need to, because it is a lot faster. It reduces the total CPU time by 12%. The impact for the user depends on its number of cores.
dbac3c2 to
0ccf332
Compare
Done
Plugins use val extracted = Project.extract(state)
val v =
(ThisBuild / version)
.get(extracted.structure.data)
.getBut this In some plugins |
| display: Show[ScopedKey[?]] | ||
| ): T = | ||
| value getOrElse sys.error(display.show(ScopedKey(scope, key)) + " is undefined.") | ||
| value.getOrElse(sys.error(display.show(key) + " is undefined.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: While we are at it, could you check if it would be helpful to turning this into a pattern match, and inlining the method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything about Extracted in the profiling. So probably it is not even called during loading.
But in general, yes it is good idea. I'll experiment with a inline def version of getOrElse which does pattern matching to see if it makes a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This PR contains a refactoring of
Settingsand an optimization of the indexing of aggregate keys. Overall it speeds up the build even if doing less parallelism.Refactoring of Settings
Settings0used to be aMap[Scope, AttributeMap], and it is now aMap[ScopedKey[x], x]. This is better because we don't need to decompose allScopedKey[x]into aScopeand anAttributeKey[x], for recomposing it back later, which duplicates allScopedKey[x]. It reduces the number of long-livingScopedKey[x]instances by 8%, and the total number of instances by 1.4%.Also it improves the performance of
Settings0, which was responsible of 2.95% of the total CPU time, and is now responsible of 0.41%.Optimization of indexing of aggregate keys
To build the index of all aggregate keys, we were computing the reverse aggregation of each key, before indexing them. And so each aggregate key was indexed many times, once for each aggregated project. It was parallelized to reduce the latency. In this PR, we compute the reverse aggregation of all the keys at once, removing all duplication. We cannot parallelize this process anymore but we don't need to, because it is a lot faster. It reduces the total CPU time by 12%. The impact for the user depends on its number of cores.
Result
I am still loading
softwaremill/sttpon my 12 core laptop.To get the heap size and the number of instances I generate a heap dump after loading sbt once. To get the CPU time and the elapsed real time, I do
time sbt 'reload;reload;reload'.