Sorry for the late reply, lost the tab for a bit ._.
Something that might be relevant is that that LuaJIT can JIT-optimize FFI calls from Lua but can't/doesn't optimise calls into Lua made via C.
I might be able to go digging for the reference (at this point figure it's best to just reply for now, not sure if you'll see this) but I've read that "the approach" recommended to solve this problem is to move the main `for(;;)` / `while(1)` loop into Lua and have LuaJIT repeatedly FFI-call C, because that's the path that can go the fastest.
Something that might be relevant is that that LuaJIT can JIT-optimize FFI calls from Lua but can't/doesn't optimise calls into Lua made via C.
I might be able to go digging for the reference (at this point figure it's best to just reply for now, not sure if you'll see this) but I've read that "the approach" recommended to solve this problem is to move the main `for(;;)` / `while(1)` loop into Lua and have LuaJIT repeatedly FFI-call C, because that's the path that can go the fastest.