Skip to content

Tail-Call style for interp.c#1999

Open
timo wants to merge 3 commits into
mainfrom
tail_call_interpreter
Open

Tail-Call style for interp.c#1999
timo wants to merge 3 commits into
mainfrom
tail_call_interpreter

Conversation

@timo

@timo timo commented Apr 24, 2026

Copy link
Copy Markdown
Member

The suggestion to give moarvm an interpreter in tail-call style with the musttail attribute (going to be called return goto in a future standard of C) which was a very nice performance boost for some interpreters out there. @MasterDuke17 pointed it out already in april of 2021 (see refs below)

This is the first stab at allowing interp.c to be compiled into that kind of interpreter, without changing a huge amount of the code inside it.

Refs:

@timo

timo commented Apr 24, 2026

Copy link
Copy Markdown
Member Author

I'm not entirely happy with how the signature of the op functions isMVMuint8 **arg_cur_op, MVMuint8 **arg_bytecode_start, MVMRegister **arg_reg_base, MVMCompUnit **arg_cu, i.e. it's passing a pointer to these variables that live in the calling stack frame around between calls, so I think it keeps having to dereference over and over, which could explain that the performance doesn't seem so fantastic.

Here's a compiler explorer snippet where I tried to just YOLO it and just take the address of these variables on the stack inside the first frame that tail-calls, and asan immediately complains about accessing the data after the stack frame it's from has disappeared (I was hoping tail-call here means the arguments live on the stack and the frames are "compatible" in that way).

https://godbolt.org/z/P717413Gc

@lizmat

lizmat commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

For me on MacOS on Apple silicon, this is not an improvement YET:

Building rakudo before:

Stage start      :   0.000
Stage parse      :  23.644
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   3.177
Stage mast       :   4.813
Stage mbc        :   0.828
+++ Compiling	blib/CORE.d.setting.moarvm
The following step can take a long time, please be patient.
Stage start      :   0.000
Stage parse      :   0.144
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   0.009
Stage mast       :   0.017
Stage mbc        :   0.003
+++ Compiling	blib/CORE.e.setting.moarvm
The following step can take a long time, please be patient.
Stage start      :   0.000
Stage parse      :   0.964
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   0.122
Stage mast       :   0.203
Stage mbc        :   0.017

after:

The following step can take a long time, please be patient.
Stage start      :   0.000
Stage parse      :  26.049
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   3.475
Stage mast       :   5.242
Stage mbc        :   0.956
+++ Generating	gen/moar/BOOTSTRAP/v6d.nqp
+++ Compiling	blib/Perl6/BOOTSTRAP/v6d.moarvm
+++ Compiling	blib/CORE.d.setting.moarvm
The following step can take a long time, please be patient.
Stage start      :   0.000
Stage parse      :   0.147
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   0.009
Stage mast       :   0.018
Stage mbc        :   0.003
+++ Generating	gen/moar/BOOTSTRAP/v6e.nqp
+++ Compiling	blib/Perl6/BOOTSTRAP/v6e.moarvm
+++ Compiling	blib/CORE.e.setting.moarvm
The following step can take a long time, please be patient.
Stage start      :   0.000
Stage parse      :   1.023
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   0.136
Stage mast       :   0.215
Stage mbc        :   0.019

@timo timo force-pushed the tail_call_interpreter branch from 5d28616 to 48aebe7 Compare April 27, 2026 22:01
@lizmat

lizmat commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

It feels a little better than before, but still doesn't beat the old way just yet:

The following step can take a long time, please be patient.
Stage start      :   0.000
Stage parse      :  25.293
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   3.091
Stage mast       :   5.035
Stage mbc        :   0.930
+++ Compiling	blib/CORE.d.setting.moarvm
The following step can take a long time, please be patient.
Stage start      :   0.000
Stage parse      :   0.144
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   0.009
Stage mast       :   0.018
Stage mbc        :   0.003
+++ Compiling	blib/CORE.e.setting.moarvm
The following step can take a long time, please be patient.
Stage start      :   0.000
Stage parse      :   0.999
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   0.131
Stage mast       :   0.235
Stage mbc        :   0.019

And the spectest feels a lot slower as well:

Files=1361, Tests=120339, 75 wallclock secs ( 9.34 usr  3.40 sys + 893.59 cusr 71.28 csys = 977.61 CPU)

which is in the old way:

Files=1361, Tests=120339, 72 wallclock secs ( 8.22 usr  3.05 sys + 860.92 cusr 69.16 csys = 941.35 CPU)

@MasterDuke17

Copy link
Copy Markdown
Contributor

These are in Kubuntu Asahi on my MacBook Air M2, while plugged in.

With GCC 15.2.0.
main (gcc):

dan@athena:~/r/rakudo$ taskset -c 4 hyperfine --shell=none '/home/dan/r/install/bin/moar --libpath=/home/dan/Source/raku/rakudo/blib --libpath=/home/dan/r/install/share/nqp/lib rakudo.moarvm --setting=NULL.e --ll-exception --optimize=3 --target=mbc --stagestats --output=blib/CORE.e.setting.moarvm gen/moar/CORE.e.setting'
Benchmark 1: /home/dan/r/install/bin/moar --libpath=/home/dan/Source/raku/rakudo/blib --libpath=/home/dan/r/install/share/nqp/lib rakudo.moarvm --setting=NULL.e --ll-exception --optimize=3 --target=mbc --stagestats --output=blib/CORE.e.setting.moarvm gen/moar/CORE.e.setting
  Time (mean ± σ):      2.288 s ±  0.012 s    [User: 2.269 s, System: 0.015 s]
  Range (min … max):    2.266 s …  2.305 s    10 runs

branch (gcc):

dan@athena:~/r/rakudo$ taskset -c 4 hyperfine --shell=none '/home/dan/r/install/bin/moar --libpath=/home/dan/Source/raku/rakudo/blib --libpath=/home/dan/r/install/share/nqp/lib rakudo.moarvm --setting=NULL.e --ll-exception --optimize=3 --target=mbc --stagestats --output=blib/CORE.e.setting.moarvm gen/moar/CORE.e.setting'
Benchmark 1: /home/dan/r/install/bin/moar --libpath=/home/dan/Source/raku/rakudo/blib --libpath=/home/dan/r/install/share/nqp/lib rakudo.moarvm --setting=NULL.e --ll-exception --optimize=3 --target=mbc --stagestats --output=blib/CORE.e.setting.moarvm gen/moar/CORE.e.setting
  Time (mean ± σ):      2.377 s ±  0.051 s    [User: 2.356 s, System: 0.015 s]
  Range (min … max):    2.335 s …  2.498 s    10 runs

With Clang 20.1.8.
main (clang):

dan@athena:~/r/rakudo$ taskset -c 4 hyperfine --shell=none '/home/dan/r/install/bin/moar --libpath=/home/dan/Source/raku/rakudo/blib --libpath=/home/dan/r/install/share/nqp/lib rakudo.moarvm --setting=NULL.e --ll-exception --optimize=3 --target=mbc --stagestats --output=blib/CORE.e.setting.moarvm gen/moar/CORE.e.setting'
Benchmark 1: /home/dan/r/install/bin/moar --libpath=/home/dan/Source/raku/rakudo/blib --libpath=/home/dan/r/install/share/nqp/lib rakudo.moarvm --setting=NULL.e --ll-exception --optimize=3 --target=mbc --stagestats --output=blib/CORE.e.setting.moarvm gen/moar/CORE.e.setting
  Time (mean ± σ):      2.301 s ±  0.025 s    [User: 2.275 s, System: 0.014 s]
  Range (min … max):    2.262 s …  2.331 s    10 runs

branch (clang):

dan@athena:~/r/rakudo$ taskset -c 4 hyperfine --shell=none '/home/dan/r/install/bin/moar --libpath=/home/dan/Source/raku/rakudo/blib --libpath=/home/dan/r/install/share/nqp/lib rakudo.moarvm --setting=NULL.e --ll-exception --optimize=3 --target=mbc --stagestats --output=blib/CORE.e.setting.moarvm gen/moar/CORE.e.setting'
Benchmark 1: /home/dan/r/install/bin/moar --libpath=/home/dan/Source/raku/rakudo/blib --libpath=/home/dan/r/install/share/nqp/lib rakudo.moarvm --setting=NULL.e --ll-exception --optimize=3 --target=mbc --stagestats --output=blib/CORE.e.setting.moarvm gen/moar/CORE.e.setting
  Time (mean ± σ):      2.382 s ±  0.021 s    [User: 2.365 s, System: 0.016 s]
  Range (min … max):    2.341 s …  2.417 s    10 runs

@MasterDuke17

Copy link
Copy Markdown
Contributor

But wow, it's cool seeing the interpreter functions in perf!

@timo timo force-pushed the tail_call_interpreter branch from 6e1c5a6 to 0c56bf5 Compare April 30, 2026 20:53
@coke coke force-pushed the tail_call_interpreter branch from 90fc7e2 to 0c56bf5 Compare June 2, 2026 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants