WASM performance

I had a brief discussion with @jfmc about the performance of the WASM version. Jose claims a difference of about 2 times compared to the naive version. For SWI-Prolog this is closer to 10 times. Jose suspected threaded code may be problematic. Threaded code is a trick where the VM instructions are direct addresses into the VM code dispatching switch. So, I promised to check. Well, SWI-Prolog has three modes for compiling the VM:

  • A classical switch.
  • Threaded code
  • Compile each instruction to a function. Now the function pointer is used as VM instruction. This is the default for the WASM version as it builds the fastest and ran the fastest when the WASM port materialized.

But, things have changed. Using Emscripted 4.0.15 (latest) and node v22.19.0 (Fedora 42), we get

  • A classical switch: 0.38 sec
  • Threaded code: 0.38 sec
  • VM functions: 0.61 sec

So, I think we should switch. Unfortunately the first two crash on one of the tests :frowning:

2 Likes

The story is more complicated. Using threaded code or a switch results in a huge function that has many local variables. In the way setjmp()/longjmp() is handled, recursive calls to the VM interpreter quickly fill the JavaScript stack. Only 3 levels of recursion of the VM interpreter used by one of the tests already exhausts the node JavaScript stack :frowning:

This can be fixed by using WASM exception handling (-fwasm-exceptions), but this makes the system again about 30% slower :frowning:

Not sure where to go. Possibly we can get rid of the (few) setjmp()/longjmp() usages.