Ha, you should see the preprocessor definitions file I was using before I cleared out all the dead-end exploration I did! I called it “macrobatics.h”. There was integer arithmetic. There were list manipulation primitives. There were closures.
Anyway, the main question, why is vmi-functions performing so poorly? Very good question. The only actual changes I can think of between the goto-mode code from the original and the branch are:
- The goto labels changed from
D_BREAK_LBLtoinstr_D_BREAK. Should be zero impact, unless I missed a reference in some way that didn’t cause a build failure. - The attvar wakeup handler is now listed before the VMI instructions. Should be zero impact, there’s no fall though, the compiler is free to reorder anyway.
- Registers are members of a struct instead of individual variables. Shouldn’t make a difference to the optimizer, unless there’s some magic happening with in-memory layout of the variables, which seems unlikely given how widespread the usage is. Technically, it could increase the stack frame size if some of the register variables actually managed to avoid getting spilled into memory locations, but given the size of the function that seems incredibly unlikely.
- The pl-alloc indirect functions now have internals which use struct-return. Should be zero impact, all these are marked inline and the optimizer should be able to figure that out.
- Helper vars are now scoped to
PL_next_solutioninstead of block-scoped. Ideally it shouldn’t make a difference? But it might cause more register spilling. Anyway I tried to minimize that with
Okay, never mind, I think I know what’s going on here. I threw all the various helper arg types in a union as an attempt to indicate to the compiler “when one of these is used, the others don’t matter, feel free to reuse the stack space” but I think it backfired; the union is probably forcing the compiler to spill these to the stack, rather than leaving them in registers. Try getting rid of the union declaration at the top of PL_next_solution and adjusting the HELPER_ARGS macro to compensate. I’ll be doing the same over here, of course, to see if I can reproduce and fix.