[2.x] Refactor `Settings` and optimize indexing of aggregate keys #7879

adpi2 · 2024-11-12T09:17:01Z

This PR contains a refactoring of Settings and an optimization of the indexing of aggregate keys. Overall it speeds up the build even if doing less parallelism.

Refactoring of Settings

Settings0 used to be a Map[Scope, AttributeMap], and it is now a Map[ScopedKey[x], x]. This is better because we don't need to decompose all ScopedKey[x] into a Scope and an AttributeKey[x], for recomposing it back later, which duplicates all ScopedKey[x]. It reduces the number of long-living ScopedKey[x] instances by 8%, and the total number of instances by 1.4%.

Also it improves the performance of Settings0, which was responsible of 2.95% of the total CPU time, and is now responsible of 0.41%.

Optimization of indexing of aggregate keys

To build the index of all aggregate keys, we were computing the reverse aggregation of each key, before indexing them. And so each aggregate key was indexed many times, once for each aggregated project. It was parallelized to reduce the latency. In this PR, we compute the reverse aggregation of all the keys at once, removing all duplication. We cannot parallelize this process anymore but we don't need to, because it is a lot faster. It reduces the total CPU time by 12%. The impact for the user depends on its number of cores.

Result

I am still loading softwaremill/sttp on my 12 core laptop.

To get the heap size and the number of instances I generate a heap dump after loading sbt once. To get the CPU time and the elapsed real time, I do time sbt 'reload;reload;reload'.

	Instances	Used heap size	CPU time	Elapsed real time
develop	16,364 K	410 MB	158,972 ms	35,485 ms
This PR	16,130 K	401 MB	139,792 ms	34,347 ms
	98.6 %	97.8 %	87.9%	96.7 %

mzuehlke · 2024-11-12T10:06:27Z

Great work @adpi2
while the number of instances and the heap size correspondences to and improves upon you last PR (#7866), the time is not related.

eed3si9n · 2024-11-13T04:23:57Z

This is pretty cool.

eed3si9n · 2024-11-13T04:38:09Z

Could we split this into two PRs if they are independent optimizations?
It would be great to move the PR description into commit message so we can capture this info in git.
Maybe we should also move some also into scaladoc
Are settings mostly internal implementation of sbt? Do we anticipate impact to plugin sources?

adpi2 · 2024-11-13T08:26:58Z

Could we split this into two PRs if they are independent optimizations?

Yes I can. But if I separate them, the elapsed real time is getting slightly worse. Because the new Settings does not perform well on the old KeyIndex.aggregate. For each task scope, we check the aggregate key which is often missing, or alone in its Map1. This is faster with Map[Scope, AttributeMap].

It would be great to move the PR description into commit message so we can capture this info in git.

Maybe we should also move some also into scaladoc

Will do

Are settings mostly internal implementation of sbt? Do we anticipate impact to plugin sources?

I assumed it was mostly internal. I'll check that with Github search.

Settings0 used to be a Map[Scope, AttributeMap], and is now a Map[ScopedKey[x], x]. This is better because we don't need to decompose all ScopedKey[x] into a Scope and an AttributeKey[x], for recomposing it back later, which duplicates all ScopedKey[x]. It reduces the number of long-living ScopedKey[x] by 8%, and the total number of instances by 1.4%. Also it improves the performance of Settings0, which was responsible of 2.95% of the total CPU time, and is now responsible of 0.41%.

To build the index of all aggregate keys, we were computing the reverse aggregation of each key, before indexing them. And so each aggregate key was indexed many times, once for each aggregated project. It was parallelized to reduce the latency. In this PR, we compute the reverse aggregation of all the keys at once, removing all duplication. We cannot parallelize this process anymore but we don't need to, because it is a lot faster. It reduces the total CPU time by 12%. The impact for the user depends on its number of cores.

adpi2 · 2024-11-13T12:25:42Z

It would be great to move the PR description into commit message so we can capture this info in git.

Maybe we should also move some also into scaladoc

Done

Are settings mostly internal implementation of sbt? Do we anticipate impact to plugin sources?

Plugins use Settings to access their values in a command. For instance here in Scalafix we have:

      val extracted = Project.extract(state)
      val v =
        (ThisBuild / version)
          .get(extracted.structure.data)
          .get

But this get method is still there, so I think it is going to be fine in most cases.

In some plugins Settings[Scope] is written explicitly as parameter, and we will need a Compat to convert it to Def.Settings:

eed3si9n · 2024-11-13T15:38:49Z

main/src/main/scala/sbt/Extracted.scala

      display: Show[ScopedKey[?]]
  ): T =
-    value getOrElse sys.error(display.show(ScopedKey(scope, key)) + " is undefined.")
+    value.getOrElse(sys.error(display.show(key) + " is undefined."))


nit: While we are at it, could you check if it would be helpful to turning this into a pattern match, and inlining the method?

I don't see anything about Extracted in the profiling. So probably it is not even called during loading.

But in general, yes it is good idea. I'll experiment with a inline def version of getOrElse which does pattern matching to see if it makes a difference.

eed3si9n

Thanks!

adpi2 requested a review from eed3si9n November 12, 2024 10:52

adpi2 added 2 commits November 13, 2024 12:25

adpi2 force-pushed the 2.x-refactor-settings branch from dbac3c2 to 0ccf332 Compare November 13, 2024 11:25

eed3si9n reviewed Nov 13, 2024

View reviewed changes

eed3si9n approved these changes Nov 13, 2024

View reviewed changes

adpi2 merged commit f82336d into sbt:develop Nov 13, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[2.x] Refactor `Settings` and optimize indexing of aggregate keys #7879

[2.x] Refactor `Settings` and optimize indexing of aggregate keys #7879

Uh oh!

adpi2 commented Nov 12, 2024 •

edited

Loading

Uh oh!

mzuehlke commented Nov 12, 2024

Uh oh!

eed3si9n commented Nov 13, 2024

Uh oh!

eed3si9n commented Nov 13, 2024

Uh oh!

adpi2 commented Nov 13, 2024

Uh oh!

adpi2 commented Nov 13, 2024

Uh oh!

eed3si9n Nov 13, 2024

Uh oh!

adpi2 Nov 13, 2024

Uh oh!

eed3si9n left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[2.x] Refactor Settings and optimize indexing of aggregate keys #7879

[2.x] Refactor Settings and optimize indexing of aggregate keys #7879

Uh oh!

Conversation

adpi2 commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactoring of Settings

Optimization of indexing of aggregate keys

Result

Uh oh!

mzuehlke commented Nov 12, 2024

Uh oh!

eed3si9n commented Nov 13, 2024

Uh oh!

eed3si9n commented Nov 13, 2024

Uh oh!

adpi2 commented Nov 13, 2024

Uh oh!

adpi2 commented Nov 13, 2024

Uh oh!

eed3si9n Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

adpi2 Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

eed3si9n left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[2.x] Refactor `Settings` and optimize indexing of aggregate keys #7879

[2.x] Refactor `Settings` and optimize indexing of aggregate keys #7879

adpi2 commented Nov 12, 2024 •

edited

Loading