Also potentially relevant: Ralf, or I should say, Dr. Jung, recently completed a PhD, which I'm sure will have a lot more fascinating material for those interested in this paper. I'm hoping to find time myself to read it, but I seem to spend too much time on sites like Hacker News...
Note that this isn't about dereferencing x after free, which is understandably not valid. Rather the standard specifies that any use of the pointer's value itself is undefined after being used as an argument to free, even though syntactically free could not have altered that.
This special behavior is also specifically applied to FILE* pointers after fclose() has been called on them.
If there is some historical reason / architecture that could explain this part of the specification I would be interested to hear the rationale, this has been present in mostly the same wording since C89.
So, in effect, guaranteeing that the results of two mallocs can never alias each other, while allowing the implementation to reuse freed memory, requires semantically adjusting the value of a pointer to a unique, unaddressable value.
[1] I think, but I'm not sure which versions of C/C++ added this guarantee
It seems like they could have just said: malloc won't give you a pointer that overlaps with the storage of any live malloc'd object. Such a malloc is implementable without too much trouble. But instead, they gave a stronger guarantee--that all malloc'd pointers would be "unique". It would be unboundedly burdensome on the implementation to meet this property, so what do they do? Update the standard to offer the achievable guarantee? No! They add a new rule, ensuring that it's impossible to observe that the stronger guarantee is not met without doing something "illegal". Instead of getting their act together, they have elected to punish whistleblowers.
On second thought, the choice they've made isn't as sadistic as it sounds. I was thinking of the standard as a contract between the language implementor and the programmer, but actually it is a contract between the language implementor, the programmer, and arbitrarily many other programmers. The stance they have chosen mandates a social convention, that the names of the dead will never be spoken. If everyone builds their APIs with this covenant in mind, it makes it possible to use pointers as unique ids (for whatever that's worth). C has never had much in the way of widely-followed social conventions, so practically speaking, the only way to ensure everyone knows they can depend on other programmers behaving this way is for the compiler to flagellate anyone who steps out of line.
I feel like you should not need to carve out extra language in the standard to explain this. It's very clear that following *x is undefined after free(x). It's also clear from any reasonable understanding of malloc that x's numeric value might collide with a later allocation coincidentally. Why should that make "if (x)" undefined?
It's true that the result of "if (x == y)" would depend on coincidences lining up and you should not rely on either one. Calling any evaluation of x "undefined" seems much more extreme than that though.
At first glance, it does some like an unnecessarily gratuitous instance of undefined behavior. However... what could you actually meaningfully do with a pointer to freed memory anyways? You clearly can't dereference it. The only pointers you can compare it to are other pointers to the now-freed object (cross-object pointer comparisons are UB in C) and NULL. But if you were going to compare it to NULL, it's presumably to guard against a dereference, so you'd end up UB anyways if you didn't overwrite it to NULL in the first place, at which point it's not being compared with anything anymore.
Let's say you have two pointers that are sometimes unique and sometimes aliases. Maybe they mean semantically different things but they happen to be the same for some cases, and different in others. They always are on the heap. You want to clean them up when your function exits, freeing them both, or once if they are not unique.
free(p);
if (p != q)
free(q);
Believe it or not I have written something like this, although with integer file descriptors being closed rather than heap buffers freed. eg. Maybe some fds are passed for both reading or writing, or sometimes you have a unique fd for each, but it all needs to hit close(2).
To exist within the standard I guess you need to do the comparison first:
if (p != q)
free(p);
free(q);
Edit: ah, but you just said "cross-object pointer comparisons are UB". I can't see a good reason for that either, but I do suppose it might make some architecture's non-linear pointer representation work better.
For some reason, I thought equality and inequality pointer comparisons required pointers to be in the same object. It's actually only the relational operators that are undefined if not in the same object. (Although I believe most compilers will treat them as unspecified instead of undefined).
I think that's another one of those "in theory, there is no difference between theory and practice; in practice, there is", sort of things.
One must keep in mind that the standard does not exist in a void, despite what the smartass language-lawyers unfortunately think/want. The standard committee simply did not decide to standardise things that they understood could vary between implementations. In fact, the standard has always said that "behaving during translation or program executed in a documented manner characteristic of the environment" is a possible outcome of UB.
I suspect the explanation is mundane: I'd expect it's just to allow a compiler to actually null out the pointer if it is able and willing to (for debugging or whatever) without affecting the correctness of the program.
There might be an architecture out there, where malloc() creates a new memory segment, free() invalidates it, and merely loading an invalid pointer into a register causes a hardware trap. It may not even be possible to load a pointer into an integer register to test it against NULL.
I'm not aware of any architecture that does this, however, I think this is exactly how 80286 and later behave in protected mode, if malloc() allocated fresh segments.
Can you cite chapter and verse in the standard for this?
It certainly does not make sense for current implementations, and I find it difficult to imagine a pointer representation where it does make sense. Perhaps if reading the address itself involved some indirection.
Annex J.2 "The value of a pointer to an object whose lifetime has ended is used (6.2.4)."
6.2.4 "The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime."
The only thing I can think of is that it allows the compiler to recycle the storage occupied by the pointer itself for something else after free() is called, but I don't see much value(!) in that either, given that if it could do variable lifetime analysis, it would be able to do it for pointers too.
This and Annex J.2 Undefined Behavior and Annex L.3 Analyzability Requirements both also specifically include:
"The value of a pointer that refers to space deallocated by a call to the free or realloc function is used (7.22.3)."
I read the article and I am not convinced by the author arguments.
He is talking about edge cases.
I've been using C and C++ for over 20 years now, for both low level and high level stuff, and I never needed to know about those edge cases.
And I think you don't want to know about them, you don't need to know about them to write good code.
Nitpicking about edge cases tradeoffs made by compiler designers and language standards decades ago is like attacking under the belt, in my opinion.
The point isn’t that anyone would want to write code like this. It’s to define rules that establish what the language implementation is allowed to assume about pointers when performing optimisations, and consequently, what operations code is allowed to perform without running into UB. The best way to study such rules is with edge cases like this.
It’s like a trolley problem in a way. The situation may be unlikely, but better have an answer for it ready instead of replying ‘how dare you ask such questions’.
It might not matter to your average programmer, but the author, ralfj, is writing from the point of view of RustBelt, a project to establish a formal foundation and soundness proofs for Rust's type system and memory model. He recently defended his PhD thesis on the topic [1].
Can you clarify what you disagree with in the article? The author wasn't saying programmers need to understand all the complexities of pointers to write good code, nor was he saying anything about the quality of those languages.
Also, it is incredibility defensive to accuse the author of "attacking under the belt" when his observation wasn't aimed at any specific language.
In addition to all the things mentioned above, there's also another kind of byte: structure padding. Nobody really knows how to deal with it, and it has extremely strange rules–it's kind of uninitialized in some cases, but you can give it a value sometimes…and sometimes you can't, or not in any way that sticks. It's truly strange stuff, and its exact nature is still being worked out.
Here's the author of this article (Ralf Jung) writing about padding bytes in the context of Rust (but I suppose it mostly applies to C and C++, since Rust inherited their memory model)
I believe I disagree with that model (well, the one in issue linked from the Stack Overflow post), where memcpy needs to copy all the bytes it is told to–I think it is only required to copy bytes that are observable. I agree with the blog post for the most part but it fails to bring up the example of whether you can "freeze" something that is is uninitialized by reading its value and using it in an expression that can only have one value, such as
int x; // uninitialized
int y = x; // initialized? "frozen"?
int z = y ^ y; // zero?
I do not think this has been worked out yet–although I never get invited to these discussions, so perhaps I missed recent developments in this area. The other missing part is trap representations–I think Rust only has unspecified, but not indeterminate values. And finally, padding is different from uninitialized memory because it is quite difficult to actually initialize it–the standard lists some ways to zero it out (brace initializer lists) but it seems like pretty much any operation will move it back to being uninitialized. So that's a different than normal uninitialized memory for like static variables, since those have values that you can actually give them and they stick around rather than staying in this weird indeterminate state.
I don't know abou the memcpy part from Stack Overflow, but about the second part:
It's UB to read from uninitialized data, and the thing about UB is that the compiler doesn't need to be specially consistent about it. It doesn't need to select the same y for y ^ y, it might transform this into 0 ^ 0 or 10 ^ 5 or select any other arbitrary value for each instance of y in this expression. So the value of z can't be relied upon. even int z = 0*y; can't be relied to produce z equal to 0!
The compiler might even delete the whole code after int y = x; because after this line the code is UB. That's why the compiler might remove null-checks (and all code associated with it) done after you dereference a pointer, because since it's UB to follow a null pointer (the only way the program could be possibly correct is if the pointer were not null!), all further checks become dead code.
Also, the UB story in Rust is still being developed, but regarding memory representation it's expected to be the same as C. That's because LLVM is quite happy in doing such atrocious but spec-compliant optimizations.
Also, now that LLVM supports freeze as an operation (see links below), Rust might surface freeze to the programmer as a primitive low level operation, in order to control the propagation of uninitialized-ness and avoid doing UB when reading variables. It would be something like
Pack your structs to not have padding? The probable reason it sometimes works is purely how the compiler decides to do the copy. Anything that relies on implicit padding is horribly broken software.
I think you might be misunderstanding my comment, I’m not complaining about struct padding causing issues in my code, I’m saying the standard is wishy-washy about it. This can be problematic when you are doing things like memcpying a structure, or trying to zero it out before sending it somewhere else. (And, FWIW, there is no standard way to disable struct padding, and doing so will likely generate slower code anyways.)
How is the standard not explicit? copying with padding doesn't need to copy the padding at all or can if it's faster, memcpy is a intrinsic the compiler can copy whatever it deems wrt padding.
Because there are certain cases in C11 where padding can be legally observed because it is defined to be set to zero–brace initialization, for example. It is unclear what happens in this case when you do anything with the memory other than read it (for example, storing to a structure member, based on the current reading of the standard, may make padding indeterminate again–plus, is "memcpy" a "read"?). See here for some of the questions raised: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1793.pdf
Is padding specified by the spec? Because if it's not then padding won't help you either since you don't know that your padding aligns with the compiler's. Rather AFAIK you need to write some compile time checks that what you think the compiler is going to do is what it does and fail compilation if doesn't and then add whatever compiler/platform/option specific code to get your stuff to match along with more tests.
You would match by adding explicit padding bytes which would be archish specific (32 bit vs 64bit). A much better alternative to making life hard though is to just manually pack the struct
It always bothered me that C, and similar languages which are termed “low level”, aren’t actually compiling to anything like the actual low level hardware anymore. The match used to be quite close in the 1980’s, but nowadays the “machine code” which is “run” by a CPU is actally a kind of virtual byte code language with an interpreter implemented in microcode inside the CPU. But this byte code has flaws: For instance, this virtual machine byte code language has little or no sense of memory caches (of any kind), out-of-order execution, branch prediction, etc. Both the compiler and the CPU knows (or could know) about these things, but the compiler has no way to communicate this to the CPU other than using this ’80s style byte code which does not have these concepts. It’s like talking about advanced mathematics using nothing but words in “The Cat in the Hat” – a rather narrow communication pipe.
I’d always imagined that as parallelism rose, a new model of virtual machine would rise with it, with its own “assembly” (i.e. low level language closely tied to the machine), which would in turn be the target of a compiler of a new parallel-by-default high level language. Alas, this has not happened.
People have had the same ideas you've had, and it's gotten as far as commercial products, but the general experience has been that "sufficiently smart compiler" tends to be very lackluster in practice. Any such effort needs to be extremely cognizant of what hardware is fundamentally good and bad at, and what compilers are fundamentally good and bad at.
Let's start with branch prediction. Modern hardware branch predictors are pretty phenomenal. They can predict a basic loop with 100% accuracy (i.e., they can not only predict that the backedge is normally taken, but can predict that the loop is about to stop). In the past decade, we've seen improvements in indirect branch predictor. The classic "branch taken/branch not taken" hint is fundamentally coarser and less precise than the hardware branch predictor: it cannot take any advantage of dynamic information. Replacing the hardware branch predictor with a compiler's static branch predictor is only going to make the situation worse.
Now you might be thinking of somehow encoding the branch predictor state in hardware. This runs into the issue of encoding microarchitecture in your hardware ISA. If you ever need to change microarchitecture, you face the dilemma of having to make a new, incompatible ISA, or emulating old microarchitectural details that no longer work well. MIPS got burned with this by delay slots. In general, the rule of thumb hardware designers use is "never expose your microarchitecture in your ISA."
A related topic is the issue of memory size. Memory is relatively expensive compared to ALUs: we can stamp out more ALUs than we know how to fill. We have hit the limit of physics in sizing of our register files and L1 caches (electrons move at finite speed!), and the technologies we use to make them work quickly are also prohibitively power-hungry. This leads to the memory hierarchy of progressively larger, but cheaper and more efficient memories. The rule of thumb for performance engineers is that you count cycles only if you fit in L1 cache; otherwise, you're counting cache misses, since those cache misses will dwarf any cycle count savings you're likely to scrounge up.
I frequently see the word "anymore" used in comments like this, but the aliasing rules and the rules on undefined behavior date from the C89 standard, and they weren't invented in 1989. By the early 80s compiler writers were already pointing out that strictly treating pointers as addresses and forbidding the compiler from making any assumptions about whether addresses could clash forced very slow code, meaning that any fast code had to be written in Fortran (which actually have tougher aliasing rules than C has; given two arrays passed into a function, the compiler is entitled to assume that they do not alias).
Important to note that DOS, one of the popular platforms in the 80s, didn't have linear address space.
NEAR, FAR and LONG pointers could be constructed that two equal values would point to different locations in memory and two distinct values would point to the same object.
So you just want VLIW architecture? It turns out it's better to have that abstract bottleneck ISA in practice, so CPU hardware can iterate microarchitectures without breaking software compatibility. Intel learned about this hard way with Itanium.
This exact view is why https://futhark-lang.org/ is exciting to me; it's currently designed mainly for GPUs, but a lower-level CPU bytecode could be a new backend for it if CPU developers ever decide to expose it.
Designing a high-level language first is a mistake, I think. It risks being too complex without adequate grounding in real hardware. I think this was also APL’s mistake – by being too “mathematical”, so to speak, it’s too hard to wrap one’s head around. I think it must first come from the CPU exposing these concepts, followed by increasingly better compilers taking advantage of them, and as we get more experience about what concepts we like, we can design higher-level languages around these new concepts.
Chip design languages like VHDL might some day rise up from being slightly too close to the hardware to fill this role, but I certainly don’t know enough about any of these topics to have a qualified opinion.
Then let me tell you that AOT/JIT optimizations might also come to byte you in Java.
Not in the language itself, as the semantics are better defined, but on the overall performance behaviours, specially across different vendors and their respective implementations.
https://www.ralfj.de/blog/2020/09/03/phd.html