The Tower of Weakenings: Memory Models for Everyone

jcranmer · on April 6, 2022

Programming language semantics turn out to be far, far thornier than most people (even people who work on language committees!) give them credit for. And the memory model turns out to be the absolute worst part of it all: the hardware plays lots of fun games on you, the OS plays lots of fun games on you, and the compiler itself is playing lots of fun games on you.

Multithreaded--really, multiprocessor--memory models are really, really confusing. Hardware memory models are generally described in terms of what reorderings are possible (e.g., loads may be ordered before subsequent stores), which is easy to understand from a hardware perspective but less easy to deal with from a language perspective. There turns out to be a wonderful theorem, though, that lets you hide all of the hardware ugliness: if you properly synchronize your code, then you can't see any of the reordering, and the compiler merely has to guarantee that the synchronization is properly handled. This is the data-race-free programming model, and it is the primary basis for all modern memory models. Stray away from this model (e.g., relaxed atomics) and you reenter the world of pain, for which no satisfactory solution has yet been found.

Discovering the pains of pointer provenance is even more recent, arguably only about a decade old. For a long time, pointer provenance has been handwaved on a vague data-dependency basis (you see this in LLVM's LangRef, for example). Unfortunately, it turns out that data dependencies are not generally preserved by compilers (this also doomed the release/consume aspect of the C/C++ memory model). An escaping/lifetime-based pointer model is closer to the actual one used by compilers, but there remains tricky things about things like memcpy [1].

It turns out there are two things that make pointer provenance painful: integer-to-pointer (where does the provenance come from?), and memory (since this usually ends up creating implicit integer-to-pointer casts in most putative models that you really want to avoid). The lesson we know from pointer provenance is that pointers are not integers, and cannot be losslessly roundtripped via integers, and Rust being newer and with a heavier emphasis on safety gives it a better chance of adopting a model here that more forcefully breaks this mistaken link.

[1] One consequence is that most formal pointer provenance models actually don't let you legally write memcpy in C code.

fbkr · on April 6, 2022

> Stray away from this model (e.g., relaxed atomics)

Relaxed atomics are still data-race free, did you mean non-atomic accesses?

jcranmer · on April 6, 2022

I should clarify slightly. There are multiple definitions of "data race" in play. In the sense of the C/C++ standard, where a data race is essentially used to describe behaviors that are undefined, relaxed atomics are free of data races. However, from the perspective of a data-race-free model where data races are things that observably violate sequential consistency, relaxed atomics are absolutely data races.

In fact, the raison d'être for relaxed atomics is to permit applications to create "benign data races" [in the second sense] that aren't undefined behavior. As it turns out, though, actually specifying what semantics such a "benign data race" has is complicated, especially when you get into the realm of avoiding things like out-of-thin-air behavior (or, to reference a paper that crossed my desk a month ago, undefined behavior executed only if out-of-thin-air happens).

bigcat123 · on April 6, 2022

Hmm, I am getting a sense that modern engineers shows a sign of losing touch to computer science theoretical foundation.

It's not funny game. It's part of abstraction hierarchy and physical computing model vs. programming language computing model.

It's only funny because one would presume that there ever was a time the hardware was not playing "funny" game.

And no, it always has a giant abstraction gap between hierarchy, and that's a fundamental part of modern electronic computing. One shall allow oneself to learn those and get used to how things are actually working.

tialaramex · on April 6, 2022

I hope this new upper story of the tower is successful. I don't reckon I've actually needed anything weaker than the tower's ptr.addr() and ptr.with_addr(addr) in code I've actually shipped anywhere. Maybe people have examples of stuff (in Rust or otherwise) they've written where they're confident that would not be enough?

IainIreland · on April 6, 2022

A possible non-Rust example: SpiderMonkey uses a trick we call NaN-boxing to represent JS values, which takes advantage of the fact that doubles have 2^52 different bit patterns for NaN to store 48-bit pointers with 4 bits leftover for a type tag. Provenance for the ptr->int->double->int->ptr loop might still be manageable using a "base of the GC heap" pointer and with_addr(), but there are also cases where built-in objects store arbitrary non-GC pointers in private slots (for example, a Map object has a raw pointer to its heap-allocated backing array). In that case, there isn't an obvious way to reestablish provenance (although we could hack it for e.g. Cheri by adding a layer of indirection through a GC allocation that stores the real pointer).

Gankra · on April 6, 2022

Iain and I previously discussed this a bit, so just to add context: WebKit is already ported to CHERI BSD, so there are definitely solutions to the kinds of hax that JIT VMs like to do. The details just become very platform specific (specific instructions/operations that are supported like pointer shifting or high-bit tagging), so it's hard to discuss/describe abstractly without a CHERI expert or ISA manual next to you.

CJefferson · on April 6, 2022

The only example I can think of I've seen released code is the "xor linked list" trick, where each node of a link list stores "neighbours = prev xor next" (rather than pointers to the 'next' and 'prev' thing in the list).

The idea is this lets you iterate up and down the linked list, as if you have prev, you can do "neighbours xor prev" to get next, and "neighbours xor next" to get prev.

However, while in that particular program that did create a notable speed improvement and memory saving, I think it's probably worth sacrificing to the greater good.

ATsch · on April 6, 2022

In general the primary reason for using doubly linked lists, despite their terrible cache behaviour, is that they are the only data structure that is very easy to write in C. So I'm not sure how much that is going to be an issue in Rust where it is much easier to use other data structures that perform significantly better in practice anyway.

tialaramex · on April 6, 2022

That "terrible" cache behaviour is what you actually wanted for some concurrency problems, because if code on several physical CPUs or cores all tries to access the same region of memory that's going to be very slow, while unrelated memory will get cached locally by each CPU/ core.

That's not why Joe Junior's first C program has a linked list in it. But it might well be why Joe Senior's masterpiece Rust program has a linked list in it. On the other hand, depending on the algorithm being implemented, it may only be singly linked, and the XOR trick doesn't apply.

Rust's alloc (the library you get if you have an allocator, but not necessarily the entire OS environment) does provide a linked list if you want one, and this would be a reasonable choice in this case whereas it warns you that you probably wanted Vec if you're not sure which data structure you need.

ATsch · on April 6, 2022

This is true, but the concurrent work stealing queues and such that are commonly used for such purposes aren't usually textbook C-style linked lists. They (at least in the case of Crossbeam, don't have experience with others) allocate smaller contiguous blocks of items which are then linked together with pointers. This reduces overhead a lot more than the xor trick does while also improving cache behaviour. Also as you mentioned they're already only singly linked in the first place.

zozbot234 · on April 6, 2022

I think this would work well with the model as described in the article, since for every list operation you're assumed to be starting from an actual pointer to some list element, which you can use to endow the recomputed addresses with the right provenance component.

CJefferson · on April 6, 2022

If each element of the linked list has a different provenance, then I think you are out of luck. I could allocate the whole list in a single giant block, but that feels like it defeats the purpose -- now I'm just allocating every object in my program in one giant allocation.

Gankra · on April 6, 2022

Yeah "it's actually xoring array indices" is the standard solution

Or you can go Deep on memory models and try to apply Ralf's Xor Provenance Hack and claim that, well, provenance is stored in bytes, but it doesn't have to be a pointer-sized range of bytes, so let me have some horrible way to express "the high half has provenance 1, the low half has provenance 2" and handwave magic problem solved.

This is of course horrible and also not at all a portable notion to CHERI which tracks provenance at the granularity of "aligned pointer-sized region of memory". But hey, if it helps you sleep at night.

CJefferson · on April 6, 2022

I'm personally excited by CHERI, and feel it is part of (sorry if these words annoy some people) computing "growing up". It's interesting (to me at least) that that is the only trick I can think of (along with general pointer compression tricks, like interpreters storing 63-bit integers when a pointers last bit is set to 1), where CHERI shouldn't be (fairly) trivial to support.

I'm happy to sacrific xoring pointers, the same way I wouldn't use some "cunning trick" to build my house with half as many nails, at the risk that any minor mistake installing any of those nails would lead to my house failing over.

markjdb · on April 6, 2022

CHERI does permit tricks like storing flags in the low bits of a pointer, at least to some extent. Quite a lot of low level C code (including some in the CheriBSD kernel) needs that to work.

jcranmer · on April 6, 2022

There are definitely cases where you are converting integers into pointers ex-nihilo. Memory-mapped I/O is one (write something special to memory location 0x04ff0030 and it turns on the light!).

My current Rust project would probably need to hit up the exposed_addr interface, but it's definitely in the twilight of "how to even memory model" since it involves trying to reason about the provenance of pointer values in the register array passed to you by the third argument of the signal handler.

kibwen · on April 6, 2022

Surely the fact that you're using explicitly volatile accesses should be a dead giveaway to the compiler that funny business is going on and that the optimizer shouldn't touch it with a ten-foot pole, right? The whole point of volatile is to inhibit optimizations, after all. Maybe that's not good enough for CHERI (but surely they have some way of doing MMIO), but it sounds good enough for Miri.

jcranmer · on April 6, 2022

From a semantics perspective, volatile isn't really a solution so much as it is a brush-it-under-the-rug scenario. As you say, the point of volatile is to inhibit optimizations, but one of the issues of language semantics is that "optimization" isn't a meaningfully definable term, especially the further your target architecture gets from a Computer Architecture 101-model computer.

kibwen · on April 6, 2022

True, it may not be especially satisfying to the people writing memory models, but I'm under the impression that nobody writing a memory model has any idea how to rigorously deal with MMIO (maybe I'm wrong!). In the meantime while "optimization" is impossible to define, it is possible to define targeted classes of things that optimizers must not do in certain scenarios: volatile tells the optimizer that it can't remove this read/write under any circumstance, memory barriers tell the optimizer that it can't reorder across them under any circumstance, etc. If we accept that occupying the upper levels of the Tower Of Weakenings is already a wash in the MMIO case, then it seems like leveraging the existence of volatile should suffice to at least let us occupy the middle floors where things are fuzzy, but at least code isn't getting miscompiled.

tialaramex · on April 7, 2022

In the case of Rust, "volatile" means specifically the intrinsics:

https://doc.rust-lang.org/core/ptr/fn.read_volatile.html

https://doc.rust-lang.org/core/ptr/fn.write_volatile.html

They're specifically intended for MMIO. "Volatile operations are intended to act on I/O memory, and are guaranteed to not be elided or reordered by the compiler across other volatile operations."

Under the hood, this is issuing the raw load or store as you'd expect, and it shouldn't get touched by the compiler because the load/ store was literally the whole point of the intrinsic rather than a necessary consequence of some higher level language feature like an assignment operation.

Gankra · on April 6, 2022

My understanding is that CHERI just tells you to eat your vegetables and properly maintains a chain of custody from like bootloader to kernel to subsystems to drivers, so that anyone messing with MMIO has to do it the Right Way and get permission from whatever's up the chain of command from them to access that range of memory, even if they're also part of the kernel.

This is super vague half-remembered conversations though.

GolDDranks · on April 6, 2022

A nice think about this work is that adopting this strict model in real life codebases is going to bring to light cases where it isn't enough, and hopefully more scrutiny can be directed towards those cases.

seo-speedwagon · on April 6, 2022

This is my question as well. I don't write a lot of C or Rust code, so I'm curious when int -> ptr and ptr -> int conversions are actually needed in practice. The only scenarios I can think of with my limited domain experience in are low-level things where you actually know the numeric addresses of certain important locations in memory. In which case, saying "yeah, that's a lower level in the weakenings tower" seems sensible.

mh7 · on April 6, 2022

A classic example is when you're writing a custom allocator and need to align the address.

Another is storing pointers in a hash table - you need to do arithmetic on the pointer bits to compute the hash.

Gankra · on April 6, 2022

I already designed several new convenience methods for making this kind of thing simple and the PR to rust-lang is already open :)

tooltower · on April 6, 2022

I found this article hard to read. I was trying to understand what the current state of affairs is, and how these new abstractions improve the matter. E.g. the consistency properties, the weakenings, and when they help.

Instead it read more like an exasperated rant, that links to rust toolchain improvements. It does have an account of how to use the new features, which was good. But I couldn't glean when and how that would help me as a user

kmeisthax · on April 6, 2022

There's a prior post that this one is building off of, but basically:

1. Pointers are not addresses; they are pairs of (allocation, address). This is known as pointers having provenance.

2. Treating a pointer as an int-sized address is merely a hardware optimization; and there is already hardware that deliberately does not do this (e.g. CHERI and maybe ARM PAC).

3. The compiler needs to know the allocation part of the pointer (the one that gets lost at runtime) in order to determine if a pointer write will change a local variable. If this association is lost then you get miscompiles.

4. Converting an integer back into a pointer (e.g. usize as ptr) does not establish pointer provenance, will break CHERI, and will miscompile on other architectures.

5. Rust made the mistake of allowing #4 in unsafe code. This is entirely unsound.

6. The proposed strict-provenance APIs allows doing something like #4, but sound, by letting the user stitch an address onto a pointer with a compatible allocation. This re-establishes the chain of provenance and avoids the miscompile.

Gankra · on April 6, 2022

Rust is no more unsound than C/C++/Swift here. It basically works but that's a really frustrating answer for anyone who cares about validation/sanitizers/optimization/docs.

It's just better if most code doesn't poke the dragon (llvm) and has trivially correct semantics instead of "yeah this has to work but um, don't ask me how or why".

zozbot234 · on April 6, 2022

Looks like the root mistake here is conflating `usize` (the integer type giving the maximum size of a data object, and the type of offsets into an allocation) and `uptr` (the size of a pointer's integer representation). In a segmented memory architecture, or one using 'object-oriented memory', the two need not be the same. You need pretty much the same sort of tweaks to be able to effectively target systems like the old 8086/80286, the 65816 (of Apple II GS and SNES fame) or the Intel iAPX 432.

kmeisthax · on April 6, 2022

Even on segmented memory, `uptr` doesn't carry provenance. You still either need `with_addr` to tell the compiler that this integer aliases an existing allocation, or `with_exposed_addr` to tell the compiler that you want your use-after-free bug to also have a CVE number.

xenadu02 · on April 7, 2022

It is more complicated than that. The issue is really that sometimes you legitimately need to access or manipulate the bits of a pointer, but once you convert a pointer to any integer type then convert it back to a pointer that allows such manipulation the compiler only has two choices:

  1. Assume the new pointer could be pointing at _anything_ and so disable all optimizations that might possibly be affected. So even trivial things like deducing that "x == y" can be eliminated because x is never modified is right out the window... after all that random integer could somehow have ended up pointing at the memory for x.
  2. Assume the new pointer doesn't point at anything the compiler knows about, causing it to make optimizations that break the program. The "x == y" that it eliminated because it hoisted x out of the loop and nothing modified x... turns out your magic pointer from nowhere was pointing at x and now your modifications are never read. Or maybe it decided to put x in a register so it has no memory address. Programmers get really angry when compilers do this.

Provenance is the compiler tracking that you converted ptr -> x, fiddled the bits, then converted x -> ptr2 and being more conservative around optimizations. The Rust change means by default you would only convert a pointer to an integer type right at the point you need to twiddle the bits and only for as long as you need to do so. The compiler then has a better understanding of what you are doing and doesn't need to assume pointers could be pointing at anything.

You have to remember the compiler is doing various kinds of inlining, code transformations, etc. Normal things we expect compilers to do. A bunch of small optimizations executed, sometimes iteratively. You can't point to any specific optimization and say "that's the one that breaks things" so you can't just magic this problem away.

Programmers hate compilers that make really slow code (#1), then get really angry when the compiler breaks their program (#2), then get upset about the fix slowing down their code (#1) and the cycle repeats. Often programmers assume compiler writers are just psychopaths who refuse to fix such "obvious" problems. They often pronounce exactly how things should work, demonstrating they're at "step 1" of baby's first memory model.

If fixing this problem with pointers were easy we'd have done it a long time ago.