In AMD64 ABI, anything larger (that doesn't fit in register) will be returned by basically output pointer argument. Caller reserves space for the return value and gives pointer to the function which then fills it with the return value. Both sides are easy to optimize and no unnecessary copies are made. Function can directly construct return value to the given address and caller can directly give a pointer to a variable if function return value is used to initialise variable.