-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Implement unsafe intrinsics for compile-time builtins #11825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement unsafe intrinsics for compile-time builtins #11825
Conversation
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "wasmtime:api", "winch"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
14489f4 to
60daa62
Compare
| /// A map from a `UnsafeIntrinsic::index()` to that intrinsic's | ||
| /// module-interned type. | ||
| pub unsafe_intrinsics: [PackedOption<ModuleInternedTypeIndex>; UnsafeIntrinsic::len() as usize], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's expected that the list of intrinsics is going to grow over time, should this perhaps be PrimaryMap<SomethingIndex, (ModuleInternedTypeIndex, UnsafeIntrinsic)>? Basically ignoring unused intrinsics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking has been that we can cross that bridge if we get to it.
Right now, there are very few intrinsics, and I'm not concerned about the size of little arrays like this. However, it definitely is intended that a component that doesn't use these intrinsics doesn't have them compiled into its text section and doesn't have any additional space reserved for their VMFuncRefs and whatnot in its vmctx layout.
| // Initialize the unsafe intrinsics used by this component, if any. | ||
| for (i, module_ty) in env_component | ||
| .unsafe_intrinsics | ||
| .iter() | ||
| .enumerate() | ||
| .filter_map(|(i, ty)| ty.expand().map(|ty| (i, ty))) | ||
| { | ||
| let i = u32::try_from(i).unwrap(); | ||
| let intrinsic = UnsafeIntrinsic::from_u32(i); | ||
| let ptrs = self.component.unsafe_intrinsic_ptrs(intrinsic).expect( | ||
| "should have intrinsic pointers given that we assigned the intrinsic a type", | ||
| ); | ||
| let shared_ty = self | ||
| .component | ||
| .signatures() | ||
| .shared_type(module_ty) | ||
| .expect("should have a shared type"); | ||
| self.instance_mut(store.0).set_intrinsic( | ||
| intrinsic, | ||
| ptrs.wasm_call, | ||
| ptrs.array_call, | ||
| shared_ty, | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see a viable path to eventually omitting this? For example if we were to implement a DCE pass for functions if no intrinsics are actually imported anywhere and were inlined everywhere then all of these should get emptied out. "Just DCE" wouldn't be sufficient because of loops like this, however, and we'd have to, post function optimization, prune the list of intrinsics in theory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see a viable path to eventually omitting this?
When you say "this" what exactly are you referring to?
Already it should be the case that when unsafe intrinsics are not exposed to a component, this loop performs zero iterations and there should1 be zero space reserved for intrinsics' VMFuncRefs in the vmctx.
For example if we were to implement a DCE pass for functions if no intrinsics are actually imported anywhere and were inlined everywhere then all of these should get emptied out. "Just DCE" wouldn't be sufficient because of loops like this, however, and we'd have to, post function optimization, prune the list of intrinsics in theory.
Yes, we would need to update the env_component.unsafe_intrinsics field after doing gc-sections/DCE during linking, same as we would need to do the moral equivalent for the VMFuncRefs of defined Wasm functions that are imported/exported within a component but are ultimately never called and are dead code.
Footnotes
-
I think there may be a bug where the space is reserved unconditionally right now, looking at CI. But that is definitely unintentional. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "this" I mean this entire loop in the context when unsafe intrinsics are used. This loop exists for the vanishingly rare case that intrinsics are used but also turned into funcref values one way or another, but in practice they'll basically never get used.
In some sense I'm not saying much here, it's pretty clear that DCE won't make this loop go away, but an wasmtime-aware DCE pass which updated the unsafe_intrinsics list, would, however. So all I'm really saying here is that I think we should strive to make this loop go away in most situations when unsafe intrinsics are used, but that'll require fancy DCE.
|
Oh, also, I'd recommend using |
This commit adds the extremely unsafe `wasmtime::CodeBuilder::expose_unsafe_intrinsics` method. When enabled, the Wasm being compiled is given access to special imports that correspond to direct, unchecked and unsandboxed, native load and store operations. These intrinsics are intended to be used for implementing fast, inline-able versions of WASI interfaces that are special-cased to a particular host embedding, for example. Compile-time builtins, as originally described in [the RFC](bytecodealliance/rfcs#43), are basically made up of three parts: 1. A function inliner 2. Unsafe intrinsics 3. Component composition to encapsulate the usage of unsafe intrinsics in a safe interface Part (1) has been implemented in Wasmtime and Cranelift for a little while now (see `wasmtime::Config::compiler_inlining`). This commit is part (2). After this commit lands, part (3) can be done with `wac` and `wasm-compose`, although follow up work is required to make the developer experience nicer and more integrated into Wasmtime so that the APIs can look like those proposed in the RFC.
38f0dc1 to
3a34fe4
Compare
|
@alexcrichton I think this should be ready for another review pass |
alexcrichton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for slogging through all the CI bits and handling the miri bits, it's looking good!
To expand a bit on some of the unresolved comments from the previous review -- It feels a bit weird that there's different ways of managing the list of intrinsics for a component. The VMComponentContext either has 0 or all of them, compilation either compiles 0 or all of them, info::Component tracks a full list of intrinsics but has None for unneeded intrinsics, VMComponentContext initialization "nulls out" all intrinsics, and instantiation only fills in used intrinsics. To me this feels like a random mish-mash of different strategies to manage everything. I get your point about crossing the bridge when we get there, but I also feel like this PR is moving us to a state where it's pretty inconsistent how the intrinsics are handled. Some contexts are "all or nothing" and some contexts are "only used intrinsics".
Ideally I'd prefer a system where intrinsics were compacted/compiled on-demand as opposed to ever doing an "all or nothing" approach. My read of this is that this is basically a function of the initial analysis phase of a component and how clever it is. I would naively expect that fitting into the GlobalInitializer infrastructure would make the implementation "just fall out" by adding a new FooIndex type of some kind. Basically al lthe hash maps and helpers and such are all there, so I would naively expect the impementation to not be all that much work.
I'm perpetually worried about the quantity of work we defer given that the rate of burning down this deferred work is often much smaller than the rate of deferring, but this is a topic reasonable folks can disagree on. In that sense I'll lay out my concerns here, but I'll also leave it up to you whether to merge or not before addressing. If this merges as-is, though, mind opening an issue about these future improvements?
| // Initialize the unsafe intrinsics used by this component, if any. | ||
| for (i, module_ty) in env_component | ||
| .unsafe_intrinsics | ||
| .iter() | ||
| .enumerate() | ||
| .filter_map(|(i, ty)| ty.expand().map(|ty| (i, ty))) | ||
| { | ||
| let i = u32::try_from(i).unwrap(); | ||
| let intrinsic = UnsafeIntrinsic::from_u32(i); | ||
| let ptrs = self.component.unsafe_intrinsic_ptrs(intrinsic).expect( | ||
| "should have intrinsic pointers given that we assigned the intrinsic a type", | ||
| ); | ||
| let shared_ty = self | ||
| .component | ||
| .signatures() | ||
| .shared_type(module_ty) | ||
| .expect("should have a shared type"); | ||
| self.instance_mut(store.0).set_intrinsic( | ||
| intrinsic, | ||
| ptrs.wasm_call, | ||
| ptrs.array_call, | ||
| shared_ty, | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "this" I mean this entire loop in the context when unsafe intrinsics are used. This loop exists for the vanishingly rare case that intrinsics are used but also turned into funcref values one way or another, but in practice they'll basically never get used.
In some sense I'm not saying much here, it's pretty clear that DCE won't make this loop go away, but an wasmtime-aware DCE pass which updated the unsafe_intrinsics list, would, however. So all I'm really saying here is that I think we should strive to make this loop go away in most situations when unsafe intrinsics are used, but that'll require fancy DCE.
I hear you. I think it is important that we balance incrementalism and doing things the Right Way. Personally, I think that the lever we use to strike that balance is not by accepting that things will be in a fairly suboptimal state "temporarily" (which, as you note, is often not temporary) in service of shipping something a little sooner, but to instead drop functionality and features that we don't have time to implement well. This way, we might not have everything we ideally want, but what we do have is rock solid. So when we look at some work being left for "follow up PRs", we should ask whether what is landing now is rock solid, and the follow ups are "just" optimizations/features/functionality.1 But of course all these generalizations are very high-level and vague. We can still come to reasonable disagreements on what is considered "rock solid" or which bits of functionality even can be left out while preserving what remains. To make things in this PR a little more concrete when viewed through the above lens, I have been accepting the compromise that we will compile, link, and instantiate Ignoring for a second the implementation details in this PR around how intrinsics are tracked through translation and the like, do you agree that the above compromise is an acceptable cut to make? Because I do agree somewhat with the following:
How the intrinsics are tracked through compilation and in the And thanks as always for the thorough review! Will dig in some more momentarily. Footnotes
|
|
High-level definitely agree with everything you say, and I also want to strive for a balance of perfection and pragmatism. I'm happy to defer things to a follow-up PR to make progress effectively whenever, and the moment I pause is when the conclusion is to open an issue. Filing an issue means that it'll likely be years, if ever, before something changes. So in that sense "let's file an issue" is equated to me as "let's just ignore this" which is partly where I come from. I basically feel that there's no balance to filing an issue as it effectively means that someone else, who's likely to be less suited to the task, will have to take care of it. For this PR specifically I agree the DCE/VMFuncRef bits should not happen here. There's definitely no need to entangle all that and it's a huge project for not a whole lot of benefit right now. What I was mostly referring to was the compilation of the unsafe intrinsics where it's a mish-mash of everything vs the subset used. That to me feels like the perfect candidate for if an issue is filed it'll just be forgotten and never addressed. It feels frequent that small improvements like this are rarely justified to spend time on which means they just never get time spent on them. I agree with your rock-solid quality though, and to that end I have no concerns about this PR. I see no bugs in this PR and it's just a bit sub-optimal in a few areas. Given that I'm fine to see this merged. My personal preference would be to have a follow-up adjusting the compilation side of things to change the list-of-all-intrinsics to list-of-only-the-used-intrinsics, but I'm ok settling for an issue too. |
|
That was not the button I wanted... |
Turns out we don't need to add phases, we already have the info available to do this.
|
@alexcrichton the latest commit (9ca326b) cleans this up a little bit, and also happens to give us demand-based compilation of intrinsics, since it turns out we know which ones were |
|
Agreed! I think everything at least now only works on O(intrinsics_used), and data-representation-wise I'd still prefer |
…ce#11825) * Implement unsafe intrinsics for compile-time builtins This commit adds the extremely unsafe `wasmtime::CodeBuilder::expose_unsafe_intrinsics` method. When enabled, the Wasm being compiled is given access to special imports that correspond to direct, unchecked and unsandboxed, native load and store operations. These intrinsics are intended to be used for implementing fast, inline-able versions of WASI interfaces that are special-cased to a particular host embedding, for example. Compile-time builtins, as originally described in [the RFC](bytecodealliance/rfcs#43), are basically made up of three parts: 1. A function inliner 2. Unsafe intrinsics 3. Component composition to encapsulate the usage of unsafe intrinsics in a safe interface Part (1) has been implemented in Wasmtime and Cranelift for a little while now (see `wasmtime::Config::compiler_inlining`). This commit is part (2). After this commit lands, part (3) can be done with `wac` and `wasm-compose`, although follow up work is required to make the developer experience nicer and more integrated into Wasmtime so that the APIs can look like those proposed in the RFC. * fill out some more docs * fix non component model builds * start filling out the doc example * Factor abi params/returns out; truncate/extend pointers * Compile unsafe intrinsics on winch as well * prtest:full * have the macro define the signature * ignore tests in MIRI because MIRI can't compile Wasm * juggle pointer provenance in `Store::data[_mut]` * add a test for store data provenance and also fix it * use `VmPtr` for the store data pointer * finish writing unsafe intrinsics example * fix up docs and rules around only accessing data from `T` in a `Store<T>` * Only reserve space for the intrinsics' `VMFuncRef`s if they are in use * use dangling pointers instead of options * Rename `StoreInner::data` to `data_no_provenance` and fix some accesses to use the method accessors * Add comments about the provenance juggling inside `StoreInner::data[_mut]` * only compile intrinsics that are used Turns out we don't need to add phases, we already have the info available to do this. * fix duplicate symbol names
This commit adds the extremely unsafe
wasmtime::CodeBuilder::expose_unsafe_intrinsicsmethod. When enabled, the Wasm being compiled is given access to special imports that correspond to direct, unchecked and unsandboxed, native load and store operations. These intrinsics are intended to be used for implementing fast, inline-able versions of WASI interfaces that are special-cased to a particular host embedding, for example.Compile-time builtins, as originally described in the RFC, are basically made up of three parts:
Part (1) has been implemented in Wasmtime and Cranelift for a little while now (see
wasmtime::Config::compiler_inlining). This commit is part (2). After this commit lands, part (3) can be done withwacandwasm-compose, although follow up work is required to make the developer experience nicer and more integrated into Wasmtime so that the APIs can look like those proposed in the RFC.I still have a little bit of doc comments and examples to fill out, but I thought it would be worth opening this PR up so that folks can start taking a look now, especially as I am taking Friday off and have a super-packed day tomorrow and probably won't have time to cross all the Ts and dot all the Is before next week.
One thing that no one brought up during the RFC but which started bugging me during this implementation is whether we can expose tools for compile-time builtin authors to do spectre mitigations. Basically expose an intrinsic that lowers to
spectre_select_guardor something? Seems possible but I haven't explored the design space too much yet. Also seems like it is probably something we can do in an additive fashion, without needing to figure everything out before landing any intrinsics. Interested in folks' thoughts!