spec: Export indirection by nagisa · Pull Request #2 · near/finite-wasm

nagisa · 2022-06-10T14:26:21Z

When functions are exported, the only way to account for these functions participating in host-to-VM calls is by introducing a trampoline indirection and placing some calls to instrumentation callbacks in said trampoline.

Ekleog · 2022-06-13T13:05:02Z

+a world of pain when they start mattering more.
+-->
+
+“Stack height” is the height of the implicit stack. Stack height is a sum of:


Is it on purpose that you're counting u32 and u64 the same, and that you're counting only one per activation frame while afaict it's equivalent to 2 u64? (return address and saved rbp, not spilled local registers as it's already counted just below)

Yes, it is purposeful. Stack height here is a number of “entries” in WebAssembly specification parlance, and the actual machine quantities would be accounted for by setting up the machine stack in such a way that any combination of stacklimit entries would fit.

The need to handle the locals here specially is somewhat unfortunate, but I don't really see any way out besides having a level indirection and allocating the locals on the heap for each call activation entry.

Hmm so I don't think we actually need to count stack in number of wasm-spec-"entries", do we?

My thoughts are, sure it'll probably be hard to count in number of bytes (because currently we're making instrumentation before compiling, so we don't have type analysis on the wasm stack and don't know what is an u32 and what is an u64), but we could count in "u64 or less" sizes rather than the "u128 or less" that you suggest, thus gaining a 2x factor in number of actually-allowed memory for the same host-reserved memory usage (as function activation frames should be pretty rare).

Does that make sense?

WASM simd spec adds values that are 128 bit long though, and we might add support for it soon.

Ugh :/ and I guess these would be on the value stack with no specific eg. drop instruction just for them?

If we're thinking of adding support for that soon, I guess we can just count in terms of u128 and reconsider on the day we'll migrate this to be compiler-handled checks

I guess the instrumentation could be parametrized by the size of certain entries, and we definitely know the layout of the stack at its maximum height too, as it is determined entirely by the instructions that are part of the block.

The problem with that approach, though is that it isn't clear how that would translate to e.g. interpreters that might store a pointer to heap allocated entry on the stack or something similar.

Either way, I suggest I slap a TODO on this for now and we come back to this once the spec shapes up more.

Ekleog · 2022-06-13T13:09:02Z

+0. Assert: Stack height is 0, before any stack operations occur.
+9. Before function invocation:
+  * Charge gas for each local of `funcaddr` and execution of this `call` instruction;
+  * Require that the stack height required to execute `funcaddr` does not exceed `stacklimit`.


Suggested change

* Require that the stack height required to execute `funcaddr` does not exceed `stacklimit`.

* Require that the stack height required to execute `funcaddr` until the next `stack` call does not exceed `stacklimit`.

I avoided referencing stack and gas functions somewhat purposefully – so that the text here makes sense regardless of whatever specific instrumentation strategy is chosen. That might change in the future, but I'm thinking that a better approach would be to just adjust the execution semantics for the call instruction so that it checks the limit reserves the slots in the implicit stack.

Do you think that would make sense?

You're right! I was originally thinking of functions that look like rep push / rep pop that'd require stack calls inside the function; but upon more thought this is actually not possible because wasm blocks need to leave the block at the same stack height as it was when they were entered, so there should never be any stack call in the middle of the function and we don't need this change.

nagisa · 2022-07-13T17:46:55Z

+  (func (export "main")
+    (call $stack (i32.const 6))
+    (call $gas (i64.const 6))
+    (call $main)


@matklad I think you were pretty concerned about potential runtime performance implications of doing something along these lines – would it help if this was a return_call (i.e. a tail-call), pretty much ending up as a very cheap jmp and no extra VM stack use once this gadget is done executing?

This would require us to implement the tail call extension, but it sounds like a good idea regardless, and sounds like it'd be pretty easy/high value extension to implement.

My main concerns was not so much performance, as complexity -- adding trampolines feels like it'll require a bunch of fiddly code in the implementation.

Got it. I don’t think the complexity is a concern. The export indirection can be an entirely separate transformation pass. Executed exactly once, before any other analysis or instrumentation, it sets up the module such that the regular analysis/instrumentation can entirely ignore any complications that arise due to indirect or external calls. Instrumenting these trampolines as-if they are regular WebAssembly code is sufficient to produce precise and accurate accounting.

That said, export indirection can pose a problem to how the specification is written.

On one hand export indirection is just a tool to make it possible to write the specification such that exports and table.elems aren’t an exceptional case, on another instrumentation does need to avoid instrumenting calls to the trampolines introduced by export indirection (these trampolines are intended to be transparent after all). So far this property comes out almost by definition. It isn’t possible to insert any instrumentation before a host-VM call, unless the embedder is modified to do something like this.

When functions are exported, there is no easy way to introduce code to account for these functions’ operations before those operations are executed. At least not without an intervention of the VM. Export indirection is probably the simplest mechanism that would allow implementing proper accounting purely within WASM.

Ekleog · 2022-07-15T16:48:39Z

Taking one step back: do we actually need to specify export indirection?

I cannot think of a way export indirection can affect wasm-virtual-machine behavior. The only way I can think of it would be if we wanted to specify at exactly which amd64 instruction the execution should stop when hitting a limit, but I feel like we should avoid speccing that, just saying "if any limit is hit by the limitless execution trace, execution could stop at any point in the program but the output must be LimitExceeded with the amount of used gas specced"

That said, export-indirection.mkd feels like nice implementation documentation and the tests you added as good implementation tests, but they don't feel like something we should spec out to me.

nagisa · 2022-07-15T17:57:21Z

Export indirection is an implementation choice, yes. I felt it is worthwhile to start working from here since it is very independent of everything else here, is easy to both describe and implement and is entirely optional. The alternatives are pretty well understood too, I feel, and it is pretty clear to me that indirection is the mechanism we want to continue using for the time being at least.

I guess I could just move the document away from spec to docs, although IME it is perfectly fine as a spec appendix.

nagisa · 2022-07-20T14:56:05Z

I pushed up a commit that sketches out most of the implementation for the indirection transform. There are a couple of learnings here that I think are worth bringing up – first is that we’ll need to contribute to wasm_encoder in order to support everything wasmparser might spit out (atomics and global.get const-expr initializer in table elements both come to mind.) There are also portions such as component proposal implementation that wasmparser kinda wants to force us to implement.

I can say for sure though that splitting out things into analyze and transform phases seems to work quite okay at least for this transformation. I do definitely see ourselves wanting a nicer framework for walking through module and modifying only the interesting parts of it. Otherwise we end up with a 700-LoC modules for each analysis/transform that mostly are just traversing the structure and converting from one representation to the other. rustc-like Visit trait could be a reasonably good mechanism here and is also something I’ve seen used in wasm-mutate.

The API for this crate is not great for custom use-cases like this, but I guess if you close your eyes it is workable enough.

nagisa · 2022-07-21T14:27:15Z

I have re-evaluated use of crates such as walrus over the past few days. So far my take on walrus and such is that while it is a very nice crate indeed, this crate might be too complex for us to really be able to tell if the crate is linear in its operation. Naturally, as we add thousands of lines of our code, it might become harder here as well, but I’ve been trying to structure the code in a way such that linearity is at least somewhat obvious. This isn’t the case with walrus, I fear.

I have also took opportunity to adjust a little bit the approach in our test suite. I went ahead and integrated insta. The integration is somewhat messy (unfortunately so), but it does seem like a better approach than doing all that work manually in wast files, at least?

matklad · 2022-07-25T11:29:08Z

+            let mut insta = insta::Settings::new();
+            insta.set_prepend_module_to_snapshot(false);
+            insta.set_snapshot_path(
+                self.test_path
+                    .canonicalize()
+                    .expect("unsable to canonicalize the test path")
+                    .parent()
+                    .expect("could not get the parent directory of the test path")
+                    .join("snaps")
+            );
+            insta.set_snapshot_suffix(directive.display());
+            let (line, _) = directive.span().linecol_in(self.wast_data);
+            insta.bind(|| {
+                let _hook = std::panic::set_hook(Box::new(|_| {}));
+                let result = std::panic::catch_unwind(|| {
+                    insta::_macro_support::assert_snapshot(
+                        // Creates a ReferenceValue::Named variant
+                        insta::_macro_support::ReferenceValue::Named(Some(
+                            self.test_name.clone().into(),
+                        )),
+                        &output,
+                        env!("CARGO_MANIFEST_DIR"),
+                        "",
+                        &self.test_name,
+                        &self.test_path.display().to_string(),
+                        line as u32 + 1, // lines are 0-indexed here
+                        directive.display(),
+                    )
+                });
+                let _ = std::panic::take_hook();
+                match result {
+                    Ok(Ok(_)) => self.pass(),
+                    Ok(Err(error)) => return self.fail(&*error),
+                    Err(_panic) => return self.fail(Error::Insta),
+                }
+            });


expect-test might be a bit more friendlier to such custom processing than insta

https://docs.rs/expect-test/latest/expect_test/macro.expect_file.html

expect-test has a lot of the same problems as insta – it prefers to panic, and it still outputs diffs unconditionally to stdout/err. I went ahead and wrote a custom thing instead, which is ~the same order-of-degree of code as the ready-made crates and does not require any hacks.

matklad · 2022-07-25T11:36:25Z

since it is very independent of everything else here

I am curious if there are dependencies between exports and call_indirect? As far as I understand it, both exports and indirect calls face the same fundamental problem: we can't really instrument the call-site. Could/should we use the same mechanism to handle both?

nagisa · 2022-07-25T11:50:58Z

I am curious if there are dependencies between exports and call_indirect?

Well, implementation in nearcore wise, my proposal that exports, the start function and table elements (i.e. call_indirect) use the same indirection mechanism (as specified and implemented in this PR).

That said, there is a problem that I still need to think about/figure out with regards to call_indirect specifically. If you look at one of the functions from the instrumented snapshot:

  (func $trampoline::f2 (;1;) (type 1) (param i32)
    local.get 0
    call $f2
  )

you see that each trampoline will contain local.gets for each function parameter. Once this is instrumented with gas measurement, it will start charging the fees to set up the operand stack as necessary to call the function. For exports and start function this is okay and might actually be desirable. The latter can’t have any arguments in the first place, and for the former the host doesn’t really charge for this kind of operation, so it is actually nice that we can do it here. For call_indirect, though, this is going to end up charging for the operand stack setup twice, once in the call site and again in the trampoline.

matklad · 2022-07-25T11:58:39Z

Ah, so this indeed covers call_indirect case! Perhaps the prose can be a bit more explicit: "table elements" doesn't really connect in my brain with indirect calls. And I would see call_indirect as the primary thing to worry about here: it's much easier to write a loop which does call_indirect than to write a loop which calls exports.

though, this is going to end up charging for the operand stack setup twice, once in the call site and again in the trampoline.

🤔 this makes me like "worse is better" solution of forgoing indirects and just charging on function entry ("too late", in some sense) more :)

These snapshot crates have a strong preference towards panicking and outputting their diffs to stdout/err unconditionally, while also not really providing sufficient flexibility to integrate them into our vector-as-output shceme. It seemed easy enough to implement our own, so here we are.

nagisa · 2022-07-27T10:46:02Z

thinking this makes me like "worse is better" solution of forgoing indirects and just charging on function entry ("too late", in some sense) more :)

Hm, but then we must make sure the stack is over-provisioned sufficiently to fit all the locals a function would want (hard problem as seen with our secondary stack check), and also have the number of per-function-locals limited to low-enough-numbers that their initialization isn’t taking a huge amount of time before we can charge gas for that.

If supporting multiple runtimes (without an ability to modify some of them) was not a concern, I would really just create a separate custom section in the wasm module and have our VM implementation charge the relevant fees during function’s prologue.

Unfortunately I don’t see good alternative approaches at the moment. I will split out the supporting code out into a separate PR, so that this PR doesn’t block any parallel work.

Ekleog · 2022-08-01T16:39:52Z

If supporting multiple runtimes (without an ability to modify some of them) was not a concern, I would really just create a separate custom section in the wasm module and have our VM implementation charge the relevant fees during function’s prologue.

I'm not sure we actually have this concern? AFAICT we're only using wasmer2 in production, and other runtimes are only ever used in order to run over the archive, develop or run tests. These three other use cases don't actually need performance, so I think we could live with:

a pwasm-utils replacement "spec" that has the behavior we want, but with bad performance and non-crashing characteristics (ie. charging just after locals initialization), that we'd use for wasmtime (I don't think we even need it to support wasmer0 as wasmer0 wouldn't be supported by newer protocol versions anyway and isn't used for development so it'd stay on old pwasm-utils)
a patched wasmer2 that follows the spec of 1, charges exactly as much gas, but also avoids the delay in cost application

This actually matches our initial idea of "write a spec gas instrumentation then make it production-ready", though it also means that there may be more work on this front than anticipated.

nagisa · 2022-08-01T17:18:36Z

Even if we don't share the same implementation for gas/stack counting beetween the runtimes (thus having to ensure that the multiple implementations are all 100% spec compliant) the spec must be implementable either way.

I believe the current spec allows to charge gas/account for stack using either approach, but it isn’t 100% clear to me that it will remain so indefinitely. If base webassembly spec needed to add some trapping runtime validation during execution of a function call, then there’d be a possibility of non-deterministic execution.

nagisa · 2022-08-01T17:23:01Z

Ah, so I was looking at a wrong place, I was expecting call_indirect to do some runtime checks, but I wasn’t finding any references to runtime checks and traps. Turns out I just missed the relevant section.

So even today we must specify whether these traps must occur before our charges or after. As spec is written right now, these trapping conditions must be evaluated before any stack operations or gas charges occur as part of the function call. This in practice prohibits use of these indirection trampolines for indirect calls, I… think?

Ekleog · 2022-08-01T17:35:11Z

Hmmmm I think your link mostly means that we must specify whether we charge before or after runtime-typechecking the indirectly-called function?

IMO if we say that the runtime-typeck must happen before the gas charges (IIUC, how the spec is currently written), then it means that both the options I'm suggesting can still be used correctly: one instrumenter that charges "too late" (after locals init) for development and testing, and one patched compiler that charges "just right" (after runtime-typeck that happens before the push rip/jmp as instrumentation would be in the prologue, and before locals init)

Does that make sense?

nagisa closed this Jun 10, 2022

nagisa deleted the nagisa/export-indirection branch June 10, 2022 14:27

nagisa restored the nagisa/export-indirection branch June 10, 2022 14:29

nagisa reopened this Jun 10, 2022

nagisa changed the title ~~Specify export indirection~~ spec: Export indirection Jun 10, 2022

Ekleog reviewed Jun 13, 2022

View reviewed changes

nagisa commented Jul 13, 2022

View reviewed changes

nagisa force-pushed the nagisa/export-indirection branch from 89831c2 to 2c47767 Compare July 14, 2022 10:25

nagisa mentioned this pull request Jul 14, 2022

spec: Runtime spec for gas and stack accounting #4

Merged

Mention the start function too

b1d6043

Kickstart an implementation on indirection transform

1444ce2

nagisa force-pushed the nagisa/export-indirection branch from f397e5f to 1444ce2 Compare July 15, 2022 20:09

Implement most of the indirection transformation

0bfd2b8

nagisa force-pushed the nagisa/export-indirection branch 2 times, most recently from c1056cc to de5cc09 Compare July 21, 2022 14:21

Try insta to store results to compare against?

9498b34

The API for this crate is not great for custom use-cases like this, but I guess if you close your eyes it is workable enough.

nagisa force-pushed the nagisa/export-indirection branch from de5cc09 to 9498b34 Compare July 21, 2022 14:23

matklad reviewed Jul 25, 2022

View reviewed changes

nagisa mentioned this pull request Jul 27, 2022

test: snapshot tests #5

Closed

nagisa mentioned this pull request Dec 23, 2022

test: instrument modules according to the analysis results #23

Merged

nagisa added A-instrumentation Area: the instrumentation implemented in this crate A-docs Area: the documentation of this crate. labels Jan 27, 2023

	* Require that the stack height required to execute `funcaddr` does not exceed `stacklimit`.
	* Require that the stack height required to execute `funcaddr` until the next `stack` call does not exceed `stacklimit`.

Conversation

nagisa commented Jun 10, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nagisa Jun 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nagisa Jun 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ekleog commented Jul 15, 2022

Uh oh!

nagisa commented Jul 15, 2022

Uh oh!

nagisa commented Jul 20, 2022

Uh oh!

nagisa commented Jul 21, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matklad commented Jul 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nagisa commented Jul 25, 2022

Uh oh!

matklad commented Jul 25, 2022

Uh oh!

nagisa commented Jul 27, 2022

Uh oh!

Ekleog commented Aug 1, 2022

Uh oh!

nagisa commented Aug 1, 2022

Uh oh!

nagisa commented Aug 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ekleog commented Aug 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nagisa Jun 14, 2022 •

edited

Loading

nagisa Jun 13, 2022 •

edited

Loading

matklad commented Jul 25, 2022 •

edited

Loading

nagisa commented Aug 1, 2022 •

edited

Loading