I hope this new upper story of the tower is successful. I don't reckon I've actually needed anything weaker than the tower's ptr.addr() and ptr.with_addr(addr) in code I've actually shipped anywhere. Maybe people have examples of stuff (in Rust or otherwise) they've written where they're confident that would not be enough?
A possible non-Rust example: SpiderMonkey uses a trick we call NaN-boxing to represent JS values, which takes advantage of the fact that doubles have 2^52 different bit patterns for NaN to store 48-bit pointers with 4 bits leftover for a type tag. Provenance for the ptr->int->double->int->ptr loop might still be manageable using a "base of the GC heap" pointer and with_addr(), but there are also cases where built-in objects store arbitrary non-GC pointers in private slots (for example, a Map object has a raw pointer to its heap-allocated backing array). In that case, there isn't an obvious way to reestablish provenance (although we could hack it for e.g. Cheri by adding a layer of indirection through a GC allocation that stores the real pointer).
Iain and I previously discussed this a bit, so just to add context: WebKit is already ported to CHERI BSD, so there are definitely solutions to the kinds of hax that JIT VMs like to do. The details just become very platform specific (specific instructions/operations that are supported like pointer shifting or high-bit tagging), so it's hard to discuss/describe abstractly without a CHERI expert or ISA manual next to you.
The only example I can think of I've seen released code is the "xor linked list" trick, where each node of a link list stores "neighbours = prev xor next" (rather than pointers to the 'next' and 'prev' thing in the list).
The idea is this lets you iterate up and down the linked list, as if you have prev, you can do "neighbours xor prev" to get next, and "neighbours xor next" to get prev.
However, while in that particular program that did create a notable speed improvement and memory saving, I think it's probably worth sacrificing to the greater good.
In general the primary reason for using doubly linked lists, despite their terrible cache behaviour, is that they are the only data structure that is very easy to write in C. So I'm not sure how much that is going to be an issue in Rust where it is much easier to use other data structures that perform significantly better in practice anyway.
That "terrible" cache behaviour is what you actually wanted for some concurrency problems, because if code on several physical CPUs or cores all tries to access the same region of memory that's going to be very slow, while unrelated memory will get cached locally by each CPU/ core.
That's not why Joe Junior's first C program has a linked list in it. But it might well be why Joe Senior's masterpiece Rust program has a linked list in it. On the other hand, depending on the algorithm being implemented, it may only be singly linked, and the XOR trick doesn't apply.
Rust's alloc (the library you get if you have an allocator, but not necessarily the entire OS environment) does provide a linked list if you want one, and this would be a reasonable choice in this case whereas it warns you that you probably wanted Vec if you're not sure which data structure you need.
This is true, but the concurrent work stealing queues and such that are commonly used for such purposes aren't usually textbook C-style linked lists. They (at least in the case of Crossbeam, don't have experience with others) allocate smaller contiguous blocks of items which are then linked together with pointers. This reduces overhead a lot more than the xor trick does while also improving cache behaviour. Also as you mentioned they're already only singly linked in the first place.
I think this would work well with the model as described in the article, since for every list operation you're assumed to be starting from an actual pointer to some list element, which you can use to endow the recomputed addresses with the right provenance component.
If each element of the linked list has a different provenance, then I think you are out of luck. I could allocate the whole list in a single giant block, but that feels like it defeats the purpose -- now I'm just allocating every object in my program in one giant allocation.
Yeah "it's actually xoring array indices" is the standard solution
Or you can go Deep on memory models and try to apply Ralf's Xor Provenance Hack and claim that, well, provenance is stored in bytes, but it doesn't have to be a pointer-sized range of bytes, so let me have some horrible way to express "the high half has provenance 1, the low half has provenance 2" and handwave magic problem solved.
This is of course horrible and also not at all a portable notion to CHERI which tracks provenance at the granularity of "aligned pointer-sized region of memory". But hey, if it helps you sleep at night.
I'm personally excited by CHERI, and feel it is part of (sorry if these words annoy some people) computing "growing up". It's interesting (to me at least) that that is the only trick I can think of (along with general pointer compression tricks, like interpreters storing 63-bit integers when a pointers last bit is set to 1), where CHERI shouldn't be (fairly) trivial to support.
I'm happy to sacrific xoring pointers, the same way I wouldn't use some "cunning trick" to build my house with half as many nails, at the risk that any minor mistake installing any of those nails would lead to my house failing over.
CHERI does permit tricks like storing flags in the low bits of a pointer, at least to some extent. Quite a lot of low level C code (including some in the CheriBSD kernel) needs that to work.
There are definitely cases where you are converting integers into pointers ex-nihilo. Memory-mapped I/O is one (write something special to memory location 0x04ff0030 and it turns on the light!).
My current Rust project would probably need to hit up the exposed_addr interface, but it's definitely in the twilight of "how to even memory model" since it involves trying to reason about the provenance of pointer values in the register array passed to you by the third argument of the signal handler.
Surely the fact that you're using explicitly volatile accesses should be a dead giveaway to the compiler that funny business is going on and that the optimizer shouldn't touch it with a ten-foot pole, right? The whole point of volatile is to inhibit optimizations, after all. Maybe that's not good enough for CHERI (but surely they have some way of doing MMIO), but it sounds good enough for Miri.
From a semantics perspective, volatile isn't really a solution so much as it is a brush-it-under-the-rug scenario. As you say, the point of volatile is to inhibit optimizations, but one of the issues of language semantics is that "optimization" isn't a meaningfully definable term, especially the further your target architecture gets from a Computer Architecture 101-model computer.
True, it may not be especially satisfying to the people writing memory models, but I'm under the impression that nobody writing a memory model has any idea how to rigorously deal with MMIO (maybe I'm wrong!). In the meantime while "optimization" is impossible to define, it is possible to define targeted classes of things that optimizers must not do in certain scenarios: volatile tells the optimizer that it can't remove this read/write under any circumstance, memory barriers tell the optimizer that it can't reorder across them under any circumstance, etc. If we accept that occupying the upper levels of the Tower Of Weakenings is already a wash in the MMIO case, then it seems like leveraging the existence of volatile should suffice to at least let us occupy the middle floors where things are fuzzy, but at least code isn't getting miscompiled.
They're specifically intended for MMIO. "Volatile operations are intended to act on I/O memory, and are guaranteed to not be elided or reordered by the compiler across other volatile operations."
Under the hood, this is issuing the raw load or store as you'd expect, and it shouldn't get touched by the compiler because the load/ store was literally the whole point of the intrinsic rather than a necessary consequence of some higher level language feature like an assignment operation.
My understanding is that CHERI just tells you to eat your vegetables and properly maintains a chain of custody from like bootloader to kernel to subsystems to drivers, so that anyone messing with MMIO has to do it the Right Way and get permission from whatever's up the chain of command from them to access that range of memory, even if they're also part of the kernel.
This is super vague half-remembered conversations though.
A nice think about this work is that adopting this strict model in real life codebases is going to bring to light cases where it isn't enough, and hopefully more scrutiny can be directed towards those cases.
This is my question as well. I don't write a lot of C or Rust code, so I'm curious when int -> ptr and ptr -> int conversions are actually needed in practice. The only scenarios I can think of with my limited domain experience in are low-level things where you actually know the numeric addresses of certain important locations in memory. In which case, saying "yeah, that's a lower level in the weakenings tower" seems sensible.