Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Pointers Are Complicated, Or: What's in a Byte? (2018) (ralfj.de)
85 points by goranmoomin on Sept 5, 2020 | hide | past | favorite | 50 comments


Also potentially relevant: Ralf, or I should say, Dr. Jung, recently completed a PhD, which I'm sure will have a lot more fascinating material for those interested in this paper. I'm hoping to find time myself to read it, but I seem to spend too much time on sites like Hacker News...

https://www.ralfj.de/blog/2020/09/03/phd.html


The article doesn't touch on this, but in C pointer types are also the only kind that can be invalidated without any apparent change to their value:

  void *x = malloc(...);
  free(x);
  if(x); // undefined behavior
Note that this isn't about dereferencing x after free, which is understandably not valid. Rather the standard specifies that any use of the pointer's value itself is undefined after being used as an argument to free, even though syntactically free could not have altered that.

This special behavior is also specifically applied to FILE* pointers after fclose() has been called on them.

If there is some historical reason / architecture that could explain this part of the specification I would be interested to hear the rationale, this has been present in mostly the same wording since C89.


My gut instinct is as follows:

  void *x = malloc(...);
  void *y = malloc(...);
  assert(x != y); // standard guarantees this [1]
Yet it's fairly reasonable that:

  void *x = malloc(...);
  free(x);
  void *y = malloc(...); // malloc reused x's allocation here.
So, in effect, guaranteeing that the results of two mallocs can never alias each other, while allowing the implementation to reuse freed memory, requires semantically adjusting the value of a pointer to a unique, unaddressable value.

[1] I think, but I'm not sure which versions of C/C++ added this guarantee


This would be kind of hilarious if true.

It seems like they could have just said: malloc won't give you a pointer that overlaps with the storage of any live malloc'd object. Such a malloc is implementable without too much trouble. But instead, they gave a stronger guarantee--that all malloc'd pointers would be "unique". It would be unboundedly burdensome on the implementation to meet this property, so what do they do? Update the standard to offer the achievable guarantee? No! They add a new rule, ensuring that it's impossible to observe that the stronger guarantee is not met without doing something "illegal". Instead of getting their act together, they have elected to punish whistleblowers.


On second thought, the choice they've made isn't as sadistic as it sounds. I was thinking of the standard as a contract between the language implementor and the programmer, but actually it is a contract between the language implementor, the programmer, and arbitrarily many other programmers. The stance they have chosen mandates a social convention, that the names of the dead will never be spoken. If everyone builds their APIs with this covenant in mind, it makes it possible to use pointers as unique ids (for whatever that's worth). C has never had much in the way of widely-followed social conventions, so practically speaking, the only way to ensure everyone knows they can depend on other programmers behaving this way is for the compiler to flagellate anyone who steps out of line.


I feel like you should not need to carve out extra language in the standard to explain this. It's very clear that following *x is undefined after free(x). It's also clear from any reasonable understanding of malloc that x's numeric value might collide with a later allocation coincidentally. Why should that make "if (x)" undefined?

It's true that the result of "if (x == y)" would depend on coincidences lining up and you should not rely on either one. Calling any evaluation of x "undefined" seems much more extreme than that though.


At first glance, it does some like an unnecessarily gratuitous instance of undefined behavior. However... what could you actually meaningfully do with a pointer to freed memory anyways? You clearly can't dereference it. The only pointers you can compare it to are other pointers to the now-freed object (cross-object pointer comparisons are UB in C) and NULL. But if you were going to compare it to NULL, it's presumably to guard against a dereference, so you'd end up UB anyways if you didn't overwrite it to NULL in the first place, at which point it's not being compared with anything anymore.


I can think of one use.

Let's say you have two pointers that are sometimes unique and sometimes aliases. Maybe they mean semantically different things but they happen to be the same for some cases, and different in others. They always are on the heap. You want to clean them up when your function exits, freeing them both, or once if they are not unique.

    free(p);
    if (p != q)
       free(q);
Believe it or not I have written something like this, although with integer file descriptors being closed rather than heap buffers freed. eg. Maybe some fds are passed for both reading or writing, or sometimes you have a unique fd for each, but it all needs to hit close(2).

To exist within the standard I guess you need to do the comparison first:

    if (p != q)
       free(p);
    free(q);
Edit: ah, but you just said "cross-object pointer comparisons are UB". I can't see a good reason for that either, but I do suppose it might make some architecture's non-linear pointer representation work better.


For some reason, I thought equality and inequality pointer comparisons required pointers to be in the same object. It's actually only the relational operators that are undefined if not in the same object. (Although I believe most compilers will treat them as unspecified instead of undefined).


Malloc can return NULL so there is no guarantee of unique return values.


NULL != NULL, so at least no two values will be equal, right?


I think that's another one of those "in theory, there is no difference between theory and practice; in practice, there is", sort of things.

One must keep in mind that the standard does not exist in a void, despite what the smartass language-lawyers unfortunately think/want. The standard committee simply did not decide to standardise things that they understood could vary between implementations. In fact, the standard has always said that "behaving during translation or program executed in a documented manner characteristic of the environment" is a possible outcome of UB.


In the dark ages, that would have allowed a single-pass compiler to reuse registers more aggressively.


I suspect the explanation is mundane: I'd expect it's just to allow a compiler to actually null out the pointer if it is able and willing to (for debugging or whatever) without affecting the correctness of the program.


There might be an architecture out there, where malloc() creates a new memory segment, free() invalidates it, and merely loading an invalid pointer into a register causes a hardware trap. It may not even be possible to load a pointer into an integer register to test it against NULL.

I'm not aware of any architecture that does this, however, I think this is exactly how 80286 and later behave in protected mode, if malloc() allocated fresh segments.


Can you cite chapter and verse in the standard for this?

It certainly does not make sense for current implementations, and I find it difficult to imagine a pointer representation where it does make sense. Perhaps if reading the address itself involved some indirection.


In C99:

Annex J.2 "The value of a pointer to an object whose lifetime has ended is used (6.2.4)."

6.2.4 "The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime."

The only thing I can think of is that it allows the compiler to recycle the storage occupied by the pointer itself for something else after free() is called, but I don't see much value(!) in that either, given that if it could do variable lifetime analysis, it would be able to do it for pointers too.


This and Annex J.2 Undefined Behavior and Annex L.3 Analyzability Requirements both also specifically include: "The value of a pointer that refers to space deallocated by a call to the free or realloc function is used (7.22.3)."


I read the article and I am not convinced by the author arguments.

He is talking about edge cases. I've been using C and C++ for over 20 years now, for both low level and high level stuff, and I never needed to know about those edge cases.

And I think you don't want to know about them, you don't need to know about them to write good code.

Nitpicking about edge cases tradeoffs made by compiler designers and language standards decades ago is like attacking under the belt, in my opinion.


The point isn’t that anyone would want to write code like this. It’s to define rules that establish what the language implementation is allowed to assume about pointers when performing optimisations, and consequently, what operations code is allowed to perform without running into UB. The best way to study such rules is with edge cases like this.

It’s like a trolley problem in a way. The situation may be unlikely, but better have an answer for it ready instead of replying ‘how dare you ask such questions’.


It might not matter to your average programmer, but the author, ralfj, is writing from the point of view of RustBelt, a project to establish a formal foundation and soundness proofs for Rust's type system and memory model. He recently defended his PhD thesis on the topic [1].

[1] https://people.mpi-sws.org/~jung/thesis.html


Can you clarify what you disagree with in the article? The author wasn't saying programmers need to understand all the complexities of pointers to write good code, nor was he saying anything about the quality of those languages.

Also, it is incredibility defensive to accuse the author of "attacking under the belt" when his observation wasn't aimed at any specific language.


From my opinion, this is a way to try and push the adoption of Rust by shading an unpleasant light on the current contender : C/C++

An other way to say is that this is propaganda.


That doesn’t make sense, given:

> Both parts of this statement are false, at least in languages with unsafe features like Rust or C:

The two languages share these features, which is the whole reason this is being written in the first place.


Unsafe as a language feature is something that was invented by the Rust design team. And I don't think there is much value to this concept.

This is like calling code "ugly" and saying that's a feature.

Pointers are not unsafe.


In addition to all the things mentioned above, there's also another kind of byte: structure padding. Nobody really knows how to deal with it, and it has extremely strange rules–it's kind of uninitialized in some cases, but you can give it a value sometimes…and sometimes you can't, or not in any way that sticks. It's truly strange stuff, and its exact nature is still being worked out.


Here's the author of this article (Ralf Jung) writing about padding bytes in the context of Rust (but I suppose it mostly applies to C and C++, since Rust inherited their memory model)

https://stackoverflow.com/questions/61114026/does-stdptrwrit...

Also, he wrote a follow up article to deal just with uninitialized memory

https://www.ralfj.de/blog/2019/07/14/uninit.html

edit: the "other kind of byte" that is the padding byte is just uninitialized memory - so perhaps not another kind?


I believe I disagree with that model (well, the one in issue linked from the Stack Overflow post), where memcpy needs to copy all the bytes it is told to–I think it is only required to copy bytes that are observable. I agree with the blog post for the most part but it fails to bring up the example of whether you can "freeze" something that is is uninitialized by reading its value and using it in an expression that can only have one value, such as

  int x; // uninitialized
  int y = x; // initialized? "frozen"?
  int z = y ^ y; // zero?
I do not think this has been worked out yet–although I never get invited to these discussions, so perhaps I missed recent developments in this area. The other missing part is trap representations–I think Rust only has unspecified, but not indeterminate values. And finally, padding is different from uninitialized memory because it is quite difficult to actually initialize it–the standard lists some ways to zero it out (brace initializer lists) but it seems like pretty much any operation will move it back to being uninitialized. So that's a different than normal uninitialized memory for like static variables, since those have values that you can actually give them and they stick around rather than staying in this weird indeterminate state.


I don't know abou the memcpy part from Stack Overflow, but about the second part:

It's UB to read from uninitialized data, and the thing about UB is that the compiler doesn't need to be specially consistent about it. It doesn't need to select the same y for y ^ y, it might transform this into 0 ^ 0 or 10 ^ 5 or select any other arbitrary value for each instance of y in this expression. So the value of z can't be relied upon. even int z = 0*y; can't be relied to produce z equal to 0!

The compiler might even delete the whole code after int y = x; because after this line the code is UB. That's why the compiler might remove null-checks (and all code associated with it) done after you dereference a pointer, because since it's UB to follow a null pointer (the only way the program could be possibly correct is if the pointer were not null!), all further checks become dead code.

Also, the UB story in Rust is still being developed, but regarding memory representation it's expected to be the same as C. That's because LLVM is quite happy in doing such atrocious but spec-compliant optimizations.

Also, now that LLVM supports freeze as an operation (see links below), Rust might surface freeze to the programmer as a primitive low level operation, in order to control the propagation of uninitialized-ness and avoid doing UB when reading variables. It would be something like

    let y = mem::freeze(x);
https://internals.rust-lang.org/t/taming-undefined-behavior-...

https://www.reddit.com/r/rust/comments/gnx7y0/update_to_llvm...


Pack your structs to not have padding? The probable reason it sometimes works is purely how the compiler decides to do the copy. Anything that relies on implicit padding is horribly broken software.


I think you might be misunderstanding my comment, I’m not complaining about struct padding causing issues in my code, I’m saying the standard is wishy-washy about it. This can be problematic when you are doing things like memcpying a structure, or trying to zero it out before sending it somewhere else. (And, FWIW, there is no standard way to disable struct padding, and doing so will likely generate slower code anyways.)


How is the standard not explicit? copying with padding doesn't need to copy the padding at all or can if it's faster, memcpy is a intrinsic the compiler can copy whatever it deems wrt padding.


Because there are certain cases in C11 where padding can be legally observed because it is defined to be set to zero–brace initialization, for example. It is unclear what happens in this case when you do anything with the memory other than read it (for example, storing to a structure member, based on the current reading of the standard, may make padding indeterminate again–plus, is "memcpy" a "read"?). See here for some of the questions raised: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1793.pdf


Is padding specified by the spec? Because if it's not then padding won't help you either since you don't know that your padding aligns with the compiler's. Rather AFAIK you need to write some compile time checks that what you think the compiler is going to do is what it does and fail compilation if doesn't and then add whatever compiler/platform/option specific code to get your stuff to match along with more tests.


You would match by adding explicit padding bytes which would be archish specific (32 bit vs 64bit). A much better alternative to making life hard though is to just manually pack the struct


It always bothered me that C, and similar languages which are termed “low level”, aren’t actually compiling to anything like the actual low level hardware anymore. The match used to be quite close in the 1980’s, but nowadays the “machine code” which is “run” by a CPU is actally a kind of virtual byte code language with an interpreter implemented in microcode inside the CPU. But this byte code has flaws: For instance, this virtual machine byte code language has little or no sense of memory caches (of any kind), out-of-order execution, branch prediction, etc. Both the compiler and the CPU knows (or could know) about these things, but the compiler has no way to communicate this to the CPU other than using this ’80s style byte code which does not have these concepts. It’s like talking about advanced mathematics using nothing but words in “The Cat in the Hat” – a rather narrow communication pipe.

I’d always imagined that as parallelism rose, a new model of virtual machine would rise with it, with its own “assembly” (i.e. low level language closely tied to the machine), which would in turn be the target of a compiler of a new parallel-by-default high level language. Alas, this has not happened.


People have had the same ideas you've had, and it's gotten as far as commercial products, but the general experience has been that "sufficiently smart compiler" tends to be very lackluster in practice. Any such effort needs to be extremely cognizant of what hardware is fundamentally good and bad at, and what compilers are fundamentally good and bad at.

Let's start with branch prediction. Modern hardware branch predictors are pretty phenomenal. They can predict a basic loop with 100% accuracy (i.e., they can not only predict that the backedge is normally taken, but can predict that the loop is about to stop). In the past decade, we've seen improvements in indirect branch predictor. The classic "branch taken/branch not taken" hint is fundamentally coarser and less precise than the hardware branch predictor: it cannot take any advantage of dynamic information. Replacing the hardware branch predictor with a compiler's static branch predictor is only going to make the situation worse.

Now you might be thinking of somehow encoding the branch predictor state in hardware. This runs into the issue of encoding microarchitecture in your hardware ISA. If you ever need to change microarchitecture, you face the dilemma of having to make a new, incompatible ISA, or emulating old microarchitectural details that no longer work well. MIPS got burned with this by delay slots. In general, the rule of thumb hardware designers use is "never expose your microarchitecture in your ISA."

A related topic is the issue of memory size. Memory is relatively expensive compared to ALUs: we can stamp out more ALUs than we know how to fill. We have hit the limit of physics in sizing of our register files and L1 caches (electrons move at finite speed!), and the technologies we use to make them work quickly are also prohibitively power-hungry. This leads to the memory hierarchy of progressively larger, but cheaper and more efficient memories. The rule of thumb for performance engineers is that you count cycles only if you fit in L1 cache; otherwise, you're counting cache misses, since those cache misses will dwarf any cycle count savings you're likely to scrounge up.


I frequently see the word "anymore" used in comments like this, but the aliasing rules and the rules on undefined behavior date from the C89 standard, and they weren't invented in 1989. By the early 80s compiler writers were already pointing out that strictly treating pointers as addresses and forbidding the compiler from making any assumptions about whether addresses could clash forced very slow code, meaning that any fast code had to be written in Fortran (which actually have tougher aliasing rules than C has; given two arrays passed into a function, the compiler is entitled to assume that they do not alias).


Important to note that DOS, one of the popular platforms in the 80s, didn't have linear address space. NEAR, FAR and LONG pointers could be constructed that two equal values would point to different locations in memory and two distinct values would point to the same object.


So you just want VLIW architecture? It turns out it's better to have that abstract bottleneck ISA in practice, so CPU hardware can iterate microarchitectures without breaking software compatibility. Intel learned about this hard way with Itanium.


What you're describing is actually why it's labeled a "high-level" language. As noted by Brian Kernighan: https://www.youtube.com/watch?v=de2Hsvxaf8M

Now that there are languages that are higher level, people want to imagine that C is a low level language, when it actually isn't.


I have some late 80's books that refer to C as middle-level language actually.


I find that a more useful label.


This exact view is why https://futhark-lang.org/ is exciting to me; it's currently designed mainly for GPUs, but a lower-level CPU bytecode could be a new backend for it if CPU developers ever decide to expose it.


Designing a high-level language first is a mistake, I think. It risks being too complex without adequate grounding in real hardware. I think this was also APL’s mistake – by being too “mathematical”, so to speak, it’s too hard to wrap one’s head around. I think it must first come from the CPU exposing these concepts, followed by increasingly better compilers taking advantage of them, and as we get more experience about what concepts we like, we can design higher-level languages around these new concepts.

Chip design languages like VHDL might some day rise up from being slightly too close to the hardware to fill this role, but I certainly don’t know enough about any of these topics to have a qualified opinion.


Not every CPU is x64


> out-of-order execution

This broke me. I loved C++, I love pointers and shared memory, synchronization etc but optimisations finally creeped me out enough to switch to Java.


I don't get the motivation here. It's all invisible to you, how is it being weird enough to make you not want to use it?


Then let me tell you that AOT/JIT optimizations might also come to byte you in Java.

Not in the language itself, as the semantics are better defined, but on the overall performance behaviours, specially across different vendors and their respective implementations.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: