perf(kernel): ~67% faster assembly loading via a runtime index#5164
Open
mrgrain wants to merge 3 commits into
Open
perf(kernel): ~67% faster assembly loading via a runtime index#5164mrgrain wants to merge 3 commits into
mrgrain wants to merge 3 commits into
Conversation
rix0rrr
reviewed
Jun 16, 2026
rix0rrr
approved these changes
Jun 16, 2026
Contributor
|
Thank you for contributing! ❤️ I will now look into making sure the PR is up-to-date, then proceed to try and merge it! |
Contributor
|
Merging (with squash)... |
Contributor
Merge Queue Status
This pull request spent 2 minutes 1 second in the queue, with no time running CI. Waiting for
All conditions
ReasonPull request #5164 has been dequeued Queue conditions are not satisfied:
Failing checks: HintYou should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it. |
a2f8847 to
8f1bb4d
Compare
Loading a package made the kernel parse the package's entire `.jsii` assembly up front, even though a given execution only ever looks up a tiny fraction of the declared types (often well under 1%). For large assemblies such as aws-cdk-lib this is tens of megabytes of JSON parsed to use a few hundred kilobytes of it. When a package is served from the on-disk package cache, the kernel now builds a compact "runtime index" alongside the cached assembly: a header plus, for every type, its kind and the byte range of its (documentation- stripped) definition within a single bodies blob. Type definitions are then parsed lazily, on first lookup, so untouched types are never parsed. The index is built once per cached package the first time it is loaded -- whether it was just extracted or had been cached by an earlier run that predates this feature -- and carries a format version so that changing the layout transparently invalidates and rebuilds older indices. Building and using the index is strictly best-effort: validation loads, uncached loads, and any read/write failure fall back to a full eager parse, so correctness never depends on the cache.
8f1bb4d to
952f3bc
Compare
mrgrain
commented
Jun 16, 2026
mrgrain
commented
Jun 16, 2026
… file
Replaces the JSON-object-per-type index with a columnar binary layout, and
makes the on-disk format self-describing.
The cache entry now holds two files:
.jsii.runtime.v1.json small manifest (schema, version, data path,
assembly metadata, byte layout)
.jsii.runtime-index.v1 binary data file with four contiguous regions
laid out as
names : "fqn0
fqn1
..." (UTF-8)
kinds : uint8 per type (TypeKind ordinal)
offsets : uint32le[N+1] (into defs)
defs : type-definition JSON, doc-stripped
LazyTypes now consumes the columns directly: kinds and offsets are read as
typed-array views over the file (zero parsing), only the names blob is decoded,
and a type definition's bytes are sliced from the defs region and JSON-parsed
on first lookup. On aws-cdk-lib (~20k types) this is ~3.3x faster to load and
~43% smaller than the previous JSON map.
Both filenames are versioned, so a cache entry shared by multiple jsii
runtimes of different format versions never has one version clobber another's
files. The version is recorded both in the manifest filename and in a `version`
field inside the manifest, as an integrity check. The data file's path is also
recorded inside the manifest (defaulting to the versioned basename); the reader
resolves it relative to the manifest's directory and rejects absolute paths or
`..` traversal as defense-in-depth against a tampered or corrupted cache.
The format is also documented under
gh-pages/content/specification/7-runtime-index.md.
Test isolation: the kernel test suite previously read and pruned the
developer's real package cache via `defaultCacheRoot()`, which has caused
flakes when prior runs leave the cache inconsistent. A jest globalSetup now
points `JSII_RUNTIME_PACKAGE_CACHE_ROOT` at a fresh tmpdir for the duration
of the run (with a matching teardown), and `kernel.test.ts`'s afterAll prune
honours the override.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Loading packages is a large part of the fixed startup cost of any jsii host, and for big libraries such as aws-cdk-lib it dominates. The kernel parsed a package's entire
.jsiiassembly up front -- tens of megabytes of JSON -- even though a typical run only ever looks up a tiny fraction of the declared types (well under 1% in a large CDK synth). This makes loading roughly two-thirds faster on a warm package cache, so every host starts doing useful work sooner.Benchmark
The results are more pronounced on smaller apps, where startup time is more costly. The larger an app gets, the less relative improvement we can see.
Cold package cache
The flip-side is that we need to do more work on the first run. For a cold package cache, we are now slower:
Technical details
When a package is served from the on-disk package cache, the kernel now builds a compact "runtime index" alongside the cached assembly the first time it is loaded: a small header, plus -- for every type -- its kind and the byte range of its definition within a single bodies blob. Documentation and source-location fields, which the runtime never reads, are stripped from the stored bodies. Type definitions are then parsed lazily, on first lookup, so types that are never used are never parsed. Because the bodies are stored decompressed, warm loads also skip following the gzip redirect and decompressing the assembly.
The index is built once per cached package -- whether it was just extracted or had been cached by an earlier run that predates this feature -- and carries a format version, so changing the layout transparently invalidates and rebuilds older indices. Using and building the index is strictly best-effort: validating loads, uncached loads, and any read or write failure fall back to a full eager parse, so correctness never depends on the cache.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.