examples/llm: shard and async compilation for lfm#475
Open
elogir wants to merge 21 commits into
Open
Conversation
b0774f3 to
9453bf7
Compare
e189749 to
a5993c1
Compare
ee31fd5 to
ca6595c
Compare
854f13f to
f60f7d9
Compare
402f9a4 to
429626e
Compare
f60f7d9 to
73fe5b9
Compare
429626e to
e62055b
Compare
73fe5b9 to
ee56249
Compare
Finistere
approved these changes
Apr 17, 2026
e62055b to
cb22cae
Compare
a7c8526 to
6a293e0
Compare
Initialize memories first, then devices using direct PJRT handle lookup, and populate memory addressability afterward. This avoids relying on PJRT id/localHardwareId (ids are colisioning on neuron platform) alignment and makes default memory/device resolution use the actual PJRT objects.
The Neuron platform returns null for Buffer_ReadyEvent
706fa85 to
ea7aae3
Compare
ea7aae3 to
bee1ba1
Compare
Replace `catch unreachable` on future awaits with `try` so compilation/loading errors propagate instead of panicking. Also pass CompiledModel.deinit by value
ee56249 to
653e635
Compare
2e1af21 to
f50a339
Compare
f50a339 to
44571ac
Compare
5e0f2b8 to
1aad6fb
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is not really useful since there is no gain (max LFM 2.5 weights are 2B params) but allow us to quickly test sharding on runtimes with a slow compiler. Also parallelize compilation.