-
Notifications
You must be signed in to change notification settings - Fork 45
Performance improvements #385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
`Ptr<String>` is now used as the standard string type instead of `Ptr<str>`, which gives enough space in KString for a non-allocated slice that uses u32 bounds. StringSlice with usize bounds is still available for strings larger than 4GB. StringSlice is used heavily by the runtime (all access ops pull a StringSlice out of the constant pool) so this results in a significant speedup (5-6% in benchmarks). Inlined strings are now unnecessary (and created overhead), and can be removed. String slices
This improves some benchmarks, the extra dereference doesn't seem to add significant cost, and it's better to move a Vec into the Tuple rather than having to copy its data into a new location, e.g. `.to_tuple()`.
- Remove temporary iterators from the register stack to ensure they have a reference count of one, allow pop_front to mutate the inner slice without allocation. - Rework TupleIterator to work with indices instead of using pop_front/pop_back.
The NewFrame op is run at the start of each frame, and tells the VM the number of registers required by the frame's bytecode. This results in allowing the stack size check in set_register to be removed. The VM may use additional temporary registers but will ensure that at least the required registers are present in the stack. A fair amount of reworking of the VM's call semantics has been gone in to this, hopefully simplifying the logic and clarifying how frame setup should be performed.
This is made possible by reducing the size of KString to 16, which allows KFunction to increase in size to 24 (only one variant can have a size of 24, assuming it has a niche that can be shared by KValue). This allows it to include the capture list, making KCaptureFunction redundant.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I decided to take a look at some potential runtime performance improvements, overall this PR produces benchmark improvements of 11-20% on my machine.