Skip to content

Conversation

@fitzgen
Copy link
Member

@fitzgen fitzgen commented Oct 8, 2025

This commit adds the extremely unsafe
wasmtime::CodeBuilder::expose_unsafe_intrinsics method. When enabled, the Wasm being compiled is given access to special imports that correspond to direct, unchecked and unsandboxed, native load and store operations. These intrinsics are intended to be used for implementing fast, inline-able versions of WASI interfaces that are special-cased to a particular host embedding, for example.

Compile-time builtins, as originally described in the RFC, are basically made up of three parts:

  1. A function inliner
  2. Unsafe intrinsics
  3. Component composition to encapsulate the usage of unsafe intrinsics in a safe interface

Part (1) has been implemented in Wasmtime and Cranelift for a little while now (see wasmtime::Config::compiler_inlining). This commit is part (2). After this commit lands, part (3) can be done with wac and wasm-compose, although follow up work is required to make the developer experience nicer and more integrated into Wasmtime so that the APIs can look like those proposed in the RFC.


I still have a little bit of doc comments and examples to fill out, but I thought it would be worth opening this PR up so that folks can start taking a look now, especially as I am taking Friday off and have a super-packed day tomorrow and probably won't have time to cross all the Ts and dot all the Is before next week.

One thing that no one brought up during the RFC but which started bugging me during this implementation is whether we can expose tools for compile-time builtin authors to do spectre mitigations. Basically expose an intrinsic that lowers to spectre_select_guard or something? Seems possible but I haven't explored the design space too much yet. Also seems like it is probably something we can do in an additive fashion, without needing to figure everything out before landing any intrinsics. Interested in folks' thoughts!

@fitzgen fitzgen requested review from a team as code owners October 8, 2025 23:33
@fitzgen fitzgen requested review from alexcrichton and cfallin and removed request for a team October 8, 2025 23:33
@github-actions github-actions bot added wasmtime:api Related to the API of the `wasmtime` crate itself winch Winch issues or pull requests labels Oct 9, 2025
@github-actions
Copy link

github-actions bot commented Oct 9, 2025

Subscribe to Label Action

cc @saulecabrera

Details This issue or pull request has been labeled: "wasmtime:api", "winch"

Thus the following users have been cc'd because of the following labels:

  • saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@fitzgen fitzgen force-pushed the new-compile-time-builtins branch 2 times, most recently from 14489f4 to 60daa62 Compare October 9, 2025 17:36
Comment on lines +166 to +168
/// A map from a `UnsafeIntrinsic::index()` to that intrinsic's
/// module-interned type.
pub unsafe_intrinsics: [PackedOption<ModuleInternedTypeIndex>; UnsafeIntrinsic::len() as usize],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's expected that the list of intrinsics is going to grow over time, should this perhaps be PrimaryMap<SomethingIndex, (ModuleInternedTypeIndex, UnsafeIntrinsic)>? Basically ignoring unused intrinsics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking has been that we can cross that bridge if we get to it.

Right now, there are very few intrinsics, and I'm not concerned about the size of little arrays like this. However, it definitely is intended that a component that doesn't use these intrinsics doesn't have them compiled into its text section and doesn't have any additional space reserved for their VMFuncRefs and whatnot in its vmctx layout.

Comment on lines +690 to +713
// Initialize the unsafe intrinsics used by this component, if any.
for (i, module_ty) in env_component
.unsafe_intrinsics
.iter()
.enumerate()
.filter_map(|(i, ty)| ty.expand().map(|ty| (i, ty)))
{
let i = u32::try_from(i).unwrap();
let intrinsic = UnsafeIntrinsic::from_u32(i);
let ptrs = self.component.unsafe_intrinsic_ptrs(intrinsic).expect(
"should have intrinsic pointers given that we assigned the intrinsic a type",
);
let shared_ty = self
.component
.signatures()
.shared_type(module_ty)
.expect("should have a shared type");
self.instance_mut(store.0).set_intrinsic(
intrinsic,
ptrs.wasm_call,
ptrs.array_call,
shared_ty,
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see a viable path to eventually omitting this? For example if we were to implement a DCE pass for functions if no intrinsics are actually imported anywhere and were inlined everywhere then all of these should get emptied out. "Just DCE" wouldn't be sufficient because of loops like this, however, and we'd have to, post function optimization, prune the list of intrinsics in theory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see a viable path to eventually omitting this?

When you say "this" what exactly are you referring to?

Already it should be the case that when unsafe intrinsics are not exposed to a component, this loop performs zero iterations and there should1 be zero space reserved for intrinsics' VMFuncRefs in the vmctx.

For example if we were to implement a DCE pass for functions if no intrinsics are actually imported anywhere and were inlined everywhere then all of these should get emptied out. "Just DCE" wouldn't be sufficient because of loops like this, however, and we'd have to, post function optimization, prune the list of intrinsics in theory.

Yes, we would need to update the env_component.unsafe_intrinsics field after doing gc-sections/DCE during linking, same as we would need to do the moral equivalent for the VMFuncRefs of defined Wasm functions that are imported/exported within a component but are ultimately never called and are dead code.

Footnotes

  1. I think there may be a bug where the space is reserved unconditionally right now, looking at CI. But that is definitely unintentional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "this" I mean this entire loop in the context when unsafe intrinsics are used. This loop exists for the vanishingly rare case that intrinsics are used but also turned into funcref values one way or another, but in practice they'll basically never get used.

In some sense I'm not saying much here, it's pretty clear that DCE won't make this loop go away, but an wasmtime-aware DCE pass which updated the unsafe_intrinsics list, would, however. So all I'm really saying here is that I think we should strive to make this loop go away in most situations when unsafe intrinsics are used, but that'll require fancy DCE.

@alexcrichton
Copy link
Member

Oh, also, I'd recommend using prtest:full on this PR as this seems at high-risk of passing on x64 and failing elsewhere

This commit adds the extremely unsafe
`wasmtime::CodeBuilder::expose_unsafe_intrinsics` method. When enabled, the Wasm
being compiled is given access to special imports that correspond to direct,
unchecked and unsandboxed, native load and store operations. These intrinsics
are intended to be used for implementing fast, inline-able versions of WASI
interfaces that are special-cased to a particular host embedding, for example.

Compile-time builtins, as originally described in [the
RFC](bytecodealliance/rfcs#43), are basically made up of
three parts:

1. A function inliner
2. Unsafe intrinsics
3. Component composition to encapsulate the usage of unsafe intrinsics in a safe
interface

Part (1) has been implemented in Wasmtime and Cranelift for a little while
now (see `wasmtime::Config::compiler_inlining`). This commit is part (2). After
this commit lands, part (3) can be done with `wac` and `wasm-compose`, although
follow up work is required to make the developer experience nicer and more
integrated into Wasmtime so that the APIs can look like those proposed in the
RFC.
@fitzgen fitzgen force-pushed the new-compile-time-builtins branch from 38f0dc1 to 3a34fe4 Compare October 14, 2025 21:03
@fitzgen fitzgen requested a review from alexcrichton October 16, 2025 00:36
@fitzgen
Copy link
Member Author

fitzgen commented Oct 16, 2025

@alexcrichton I think this should be ready for another review pass

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for slogging through all the CI bits and handling the miri bits, it's looking good!

To expand a bit on some of the unresolved comments from the previous review -- It feels a bit weird that there's different ways of managing the list of intrinsics for a component. The VMComponentContext either has 0 or all of them, compilation either compiles 0 or all of them, info::Component tracks a full list of intrinsics but has None for unneeded intrinsics, VMComponentContext initialization "nulls out" all intrinsics, and instantiation only fills in used intrinsics. To me this feels like a random mish-mash of different strategies to manage everything. I get your point about crossing the bridge when we get there, but I also feel like this PR is moving us to a state where it's pretty inconsistent how the intrinsics are handled. Some contexts are "all or nothing" and some contexts are "only used intrinsics".

Ideally I'd prefer a system where intrinsics were compacted/compiled on-demand as opposed to ever doing an "all or nothing" approach. My read of this is that this is basically a function of the initial analysis phase of a component and how clever it is. I would naively expect that fitting into the GlobalInitializer infrastructure would make the implementation "just fall out" by adding a new FooIndex type of some kind. Basically al lthe hash maps and helpers and such are all there, so I would naively expect the impementation to not be all that much work.

I'm perpetually worried about the quantity of work we defer given that the rate of burning down this deferred work is often much smaller than the rate of deferring, but this is a topic reasonable folks can disagree on. In that sense I'll lay out my concerns here, but I'll also leave it up to you whether to merge or not before addressing. If this merges as-is, though, mind opening an issue about these future improvements?

Comment on lines +690 to +713
// Initialize the unsafe intrinsics used by this component, if any.
for (i, module_ty) in env_component
.unsafe_intrinsics
.iter()
.enumerate()
.filter_map(|(i, ty)| ty.expand().map(|ty| (i, ty)))
{
let i = u32::try_from(i).unwrap();
let intrinsic = UnsafeIntrinsic::from_u32(i);
let ptrs = self.component.unsafe_intrinsic_ptrs(intrinsic).expect(
"should have intrinsic pointers given that we assigned the intrinsic a type",
);
let shared_ty = self
.component
.signatures()
.shared_type(module_ty)
.expect("should have a shared type");
self.instance_mut(store.0).set_intrinsic(
intrinsic,
ptrs.wasm_call,
ptrs.array_call,
shared_ty,
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "this" I mean this entire loop in the context when unsafe intrinsics are used. This loop exists for the vanishingly rare case that intrinsics are used but also turned into funcref values one way or another, but in practice they'll basically never get used.

In some sense I'm not saying much here, it's pretty clear that DCE won't make this loop go away, but an wasmtime-aware DCE pass which updated the unsafe_intrinsics list, would, however. So all I'm really saying here is that I think we should strive to make this loop go away in most situations when unsafe intrinsics are used, but that'll require fancy DCE.

@fitzgen
Copy link
Member Author

fitzgen commented Oct 16, 2025

I'm perpetually worried about the quantity of work we defer given that the rate of burning down this deferred work is often much smaller than the rate of deferring, but this is a topic reasonable folks can disagree on.

I hear you. I think it is important that we balance incrementalism and doing things the Right Way. Personally, I think that the lever we use to strike that balance is not by accepting that things will be in a fairly suboptimal state "temporarily" (which, as you note, is often not temporary) in service of shipping something a little sooner, but to instead drop functionality and features that we don't have time to implement well. This way, we might not have everything we ideally want, but what we do have is rock solid. So when we look at some work being left for "follow up PRs", we should ask whether what is landing now is rock solid, and the follow ups are "just" optimizations/features/functionality.1

But of course all these generalizations are very high-level and vague. We can still come to reasonable disagreements on what is considered "rock solid" or which bits of functionality even can be left out while preserving what remains.

To make things in this PR a little more concrete when viewed through the above lens, I have been accepting the compromise that we will compile, link, and instantiate VMFuncRefs for all intrinsics when the flag to expose unsafe intrinsics is enabled. I think the Right Way to fix that, so that only the intrinsics that are actually used are compiled+linked+instantiated, is to do gc-sections/DCE during linking, but also that doing all that is something that can be delayed (perhaps for a very long time!) and things will be Fine in the meantime because what is implemented should be rock solid (albeit lacking the DCE optimizations we would have in an ideal world).

Ignoring for a second the implementation details in this PR around how intrinsics are tracked through translation and the like, do you agree that the above compromise is an acceptable cut to make?

Because I do agree somewhat with the following:

It feels a bit weird that there's different ways of managing the list of intrinsics for a component. The VMComponentContext either has 0 or all of them, compilation either compiles 0 or all of them, info::Component tracks a full list of intrinsics but has None for unneeded intrinsics, VMComponentContext initialization "nulls out" all intrinsics, and instantiation only fills in used intrinsics. To me this feels like a random mish-mash of different strategies to manage everything.

How the intrinsics are tracked through compilation and in the wasmtime-environ metadata structures is a little messy, and I'd like to improve it. But I don't want to expand the scope of those improvements to only compiling+linking+instantiating the intrinsics that are actually used. Doing that correctly (IMHO via DCE/gc-sections in linking) is too much for me to bite off immediately, and (unless I am missing something) the other ways of doing it require adding new special-cased code paths (which we have generally been trying to remove) and so would only bring us to local maxima, which doesn't seem worth spending the effort on.


And thanks as always for the thorough review! Will dig in some more momentarily.

Footnotes

  1. This is all assuming we are talking about the implementation of new features. Obviously improving existing suboptimal code, even if there are still more follow ups to be done before it is fully optimal, is worth landing right away.

@alexcrichton
Copy link
Member

High-level definitely agree with everything you say, and I also want to strive for a balance of perfection and pragmatism. I'm happy to defer things to a follow-up PR to make progress effectively whenever, and the moment I pause is when the conclusion is to open an issue. Filing an issue means that it'll likely be years, if ever, before something changes. So in that sense "let's file an issue" is equated to me as "let's just ignore this" which is partly where I come from. I basically feel that there's no balance to filing an issue as it effectively means that someone else, who's likely to be less suited to the task, will have to take care of it.

For this PR specifically I agree the DCE/VMFuncRef bits should not happen here. There's definitely no need to entangle all that and it's a huge project for not a whole lot of benefit right now. What I was mostly referring to was the compilation of the unsafe intrinsics where it's a mish-mash of everything vs the subset used. That to me feels like the perfect candidate for if an issue is filed it'll just be forgotten and never addressed. It feels frequent that small improvements like this are rarely justified to spend time on which means they just never get time spent on them.

I agree with your rock-solid quality though, and to that end I have no concerns about this PR. I see no bugs in this PR and it's just a bit sub-optimal in a few areas. Given that I'm fine to see this merged. My personal preference would be to have a follow-up adjusting the compilation side of things to change the list-of-all-intrinsics to list-of-only-the-used-intrinsics, but I'm ok settling for an issue too.

@alexcrichton
Copy link
Member

That was not the button I wanted...

@alexcrichton alexcrichton reopened this Oct 16, 2025
Turns out we don't need to add phases, we already have the info available to do
this.
@fitzgen
Copy link
Member Author

fitzgen commented Oct 16, 2025

@alexcrichton the latest commit (9ca326b) cleans this up a little bit, and also happens to give us demand-based compilation of intrinsics, since it turns out we know which ones were canon lowered by the time we are compiling functions. Mind taking another look? Does this help allieviate some of your concerns?

@alexcrichton
Copy link
Member

Agreed! I think everything at least now only works on O(intrinsics_used), and data-representation-wise I'd still prefer PrimaryMap<SomethingIndex, ...> but that's fine to defer to a future refactoring if necessary.

@fitzgen fitzgen added this pull request to the merge queue Oct 17, 2025
Merged via the queue into bytecodealliance:main with commit ad56ff9 Oct 17, 2025
172 checks passed
@fitzgen fitzgen deleted the new-compile-time-builtins branch October 17, 2025 00:25
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
…ce#11825)

* Implement unsafe intrinsics for compile-time builtins

This commit adds the extremely unsafe
`wasmtime::CodeBuilder::expose_unsafe_intrinsics` method. When enabled, the Wasm
being compiled is given access to special imports that correspond to direct,
unchecked and unsandboxed, native load and store operations. These intrinsics
are intended to be used for implementing fast, inline-able versions of WASI
interfaces that are special-cased to a particular host embedding, for example.

Compile-time builtins, as originally described in [the
RFC](bytecodealliance/rfcs#43), are basically made up of
three parts:

1. A function inliner
2. Unsafe intrinsics
3. Component composition to encapsulate the usage of unsafe intrinsics in a safe
interface

Part (1) has been implemented in Wasmtime and Cranelift for a little while
now (see `wasmtime::Config::compiler_inlining`). This commit is part (2). After
this commit lands, part (3) can be done with `wac` and `wasm-compose`, although
follow up work is required to make the developer experience nicer and more
integrated into Wasmtime so that the APIs can look like those proposed in the
RFC.

* fill out some more docs

* fix non component model builds

* start filling out the doc example

* Factor abi params/returns out; truncate/extend pointers

* Compile unsafe intrinsics on winch as well

* prtest:full

* have the macro define the signature

* ignore tests in MIRI because MIRI can't compile Wasm

* juggle pointer provenance in `Store::data[_mut]`

* add a test for store data provenance and also fix it

* use `VmPtr` for the store data pointer

* finish writing unsafe intrinsics example

* fix up docs and rules around only accessing data from `T` in a `Store<T>`

* Only reserve space for the intrinsics' `VMFuncRef`s if they are in use

* use dangling pointers instead of options

* Rename `StoreInner::data` to `data_no_provenance` and fix some accesses to use the method accessors

* Add comments about the provenance juggling inside `StoreInner::data[_mut]`

* only compile intrinsics that are used

Turns out we don't need to add phases, we already have the info available to do
this.

* fix duplicate symbol names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wasmtime:api Related to the API of the `wasmtime` crate itself winch Winch issues or pull requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants