Would segmented caches be a solution? What I mean is, allow the operating system to flag each memory page as belonging to a certain tenant id, and not allow speculation across tenant boundaries? This doesn't seem like a complicated addition, given that we already have multi-level page tables, caches and MMU accelerators for virtual environments.
Then again, if the solution was simple, I'm sure it would have already been implemented.
You want to protect against multiprocessor attacks through MESI. At which point I just have no idea of what to do purely in hardware.
The best system design change IMO would be to tag at programming language level (to systematize the introduction of the needed barriers, without needing too many of them, thus minimizing the overhead -- which is a must otherwise you can as well remove OOO entirely), but I suspect that's not going to happen for most systems. Or even any of them.
Then again, if the solution was simple, I'm sure it would have already been implemented.