Assume RAX points to the root node and nodes just contain two child pointers and everything is aligned and whatever.
:invert
cmp rax, 0
jnz swap:
ret
:swap
push rax
mov rax, [rax]
call invert
mov rbx,[rsp]
xchg rax, [rbx+8]
mov [rbx], rax
call invert
pop rax
ret
The last time I used an assembler was before x86-64 was invented, I am not even sure I ever used one in protected mode. But that seems a totally reasonable whiteboard interview question. Written in notepad, might not assemble. Might even be totally incorrect and I am posting it so that the internet generates the warning and error messages.
EDIT: After reading the article now, that seems rather inefficient to me, to use local variables on the stack for everything. And why is the function returning a node if it is mutating the tree in place?
The variables on the stack are the most efficient after registers, so you are right that a local variable should be kept into a register if possible, otherwise in the stack, and only then in other places (e.g. if it is too large for the stack).
However, when writing in assembly one must pay attention that at least RBX, RBP and R12 through R15 must be preserved by any function (on Windows also RDI and RSI must be preserved).
So in your code you should not use RBX, but a volatile register, e.g. RDX or RCX. If you would insist on using RBX, it would have to be saved and restored.
However, when writing in assembly one must pay attention that at least RBX, RBP and R12 through R15 must be preserved by any function
Only if you're calling external code that assumes that. The power of Asm largely comes from not needing to follow arbitrary conventions in your own code. The boundaries where you interface to external code are the only constraints.
The top of the stack should generally be already present in the cache, so stack memory would be faster than heap memory where objects aren’t necessary as close together or accessed as frequently.
I am not familiar with any other reason besides the fact that stack-pointer arithmetics are done through a dedicated HW, called stack-engine, which sits in the CPU frontend. This effectively means that stack-pointer uOps are going to be get executed quicker because they do not have to go all the way through the CPU backend processing. This also saves some of the bandwidth on the CPU backend side allowing other uOps to be processed sooner.
I know none of the calling conventions in any detail anymore and just used the registers in alphabetical order. Totally expected that this would violate something.
EDIT: After reading the article now, that seems rather inefficient to me, to use local variables on the stack for everything. And why is the function returning a node if it is mutating the tree in place?