Manually managing the cache would defeat the purpose of a cache. Computing with ...

IgorPartola · on Nov 24, 2014

While what you are saying is true for caches, I am talking about a CPU architecture where the near-die memory is not a cache at all. If I can request that my process get direct access to what used to be the L2 cache, with, let's say, a capacity of 4 MB's, I could then place my 2 MB array that needs sorting into it, and sort it without ever leaving L2. I then relinquish control over this fast hardware buffer, so the next process can use it.

There are of course a lot of issues with this model in a time sharing multiprocess OS: how do you request access, how do you ensure clean allocation, etc.? Those are problems that would need to be resolved. This does work well in PIC's: you get some small amount of RAM (something like 64-512 bytes), and then you can attach a separate RAM chip via i2c or SPI. Perhaps if a CPU die was surrounded by a reasonable number of such buffers (something like 16 x 8 MB), we could write completely different type of code.