Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But that's the thing: dereferencing an invalid pointer is undefined behaviour, which means the compiler is allowed to assume it never happens; a C program executing undefined behaviour is _invalid C_. Thus, any time you dereference a pointer you also implicitly proimise the compiler that this pointer will _never_ be an invalid pointer. Same with signed arithmetic: you are telling the compiler that your arithmetic is guaranteed to never overflow.

Whether this is a good or bad thing is of course a legitimate (and good!) question, but for writing C today that's how the language is specced and a reality the programmer needs to take care to avoid, just like a bunch of other things that C leaves to the programmer like remembering to clean up allocated resources when they're no longer needed.



> But that's the thing: dereferencing an invalid pointer is undefined behaviour, which means the compiler is allowed to assume it never happens

It would be more helpful for the compiler to assume that it does not know what will happen. This is how C actually worked for many years.

> a C program executing undefined behaviour is _invalid C_.

That was not formerly the case, and it is not always helpful to redefine C in this way. Sometimes you really are not trying to write portable code, and you really do want the behavior you know that the target machine will give you, even if the C spec doesn't require it.


> It would be more helpful for the compiler to assume that it does not know what will happen. This is how C actually worked for many years.

If we don't know what will happen that is Undefined Behaviour.

The contradiction you have within yourself is that you know what you want to happen, but that's not what the specification says. If you want specific behaviour you need to specify what it is - not mumble and make a vague wave of the hand about "behavior you know that the target machine will give you" when you've no promise of any such thing. That would come at a cost, and of course you don't want to pay that cost, but that means you can't have what it buys.


That is certainly one perspective that one can have. The point here is that the language and its usage precede the specification, and a pedantic, narrow-minded adherence to a certain interpretation of a document which was actually a post-hoc rationalization of existing practice has made the language less useful for certain applications.


The C standard could easily make dereferencing a null pointer implementation-defined behavior.

And even more critical: Signed integer overflow should be implementation defined, and each implementation do something sane (different from assuming it doesn't happen). This would have saved us many security vulnerabilities, and unnecessary program crashes.


If your program is crashing because of an overflow you’re lucky because it’s saving you from a security vulnerability.


> If we don't know what will happen that is Undefined Behaviour.

Implementation-defined behaviour is a thing. Not knowing what will happen is not an accurate description of undefined behaviour. What the compiler does is assume that undefined behaviour doesn’t happen. When it does happen, it results in a contradiction, and logically every sentence is a consequence of a contradiction (see e.g. “Bertrand Russell is the pope”). That produces all those infamous bugs. Because just like every sentence is a consequence of a contradiction, every program state can be a result of UB. This is untenable.


> What the compiler does is assume that undefined behaviour doesn’t happen

That is an incorrect assumption, as it clearly does.

It is also incorrect given the standard text.


> That is an incorrect assumption, as it clearly does.

That’s my entire point. Compiler is free to make incorrect assumptions.

> It is also incorrect given the standard text.

According to C11 standard, section 3.4.3, the standard imposes no requirements on undefined behaviour.


A compiler that makes incorrect assumptions is a bad compiler.

In fact, in the rationale of the original spec., I remember reading that the C standard was expressly designed to be a minimal spec, and that just being compliant with the spec was insufficient for the resulting compiler to be fit for purpose.

And of course the original spec did specify a range of acceptable behaviors, and that language is, in fact, still in the standard. It was just made non-binding. However, it is still there, and pretending it is not seems disingenuous at best.


> A compiler that makes incorrect assumptions is a bad compiler.

I agree, but that includes GCC and Clang. ¯\_(ツ)_/¯


Yep. The problem when you rely on free software is that you are not a customer.


That's OK, you can compile your program with -O0 if that's the behavior you want from your compiler.


Unfortunately, `-O0` doesn't actually disable all optimizations. It probably disables any that would affect this though.


There are many optimizations that a compiler can perform without relying on the optimization level to determine how to pervert your program that day. If different optimization levels produce different results that is bad thing, something to be avoided, not encouraged.

If it is really necessary to generate random code when some anomalous situation is encountered, that should be a special option to enable dangerous non-deterministic if-you-made-a-mistake-we-will-delete-parts-of-your-program type behavior. I wouldn't consider that optimization though, more like disabling all your compiler's safety features.


Which optimizations?


Loop unrolling for loops that have a static or range bounded number of iterations is a good example. Others include constant expression evaluation, dead code elimination, common subexpression elimination, and static function inlining.


If you fold float expressions at compile time you will get different results than runtime if the program has changed the fpu control word.

People complain about dead code elimination all the time when we have these discussions.

Inlining break code that try to read the return address off the stack frame or that make assumptions about stack layout.

Loop unrolling might change the order of stores and load, which is visible behaviour if any of those traps.

I assure you that for each optimization, no matter how trivial, it will break someone code


> It would be more helpful for the compiler to assume that it does not know what will happen. This is how C actually worked for many years.

Can't you get that behaviour with -O0 or similar?


Looking at the GCC docs, it seems like it isn't possible to have zero optimizations at any point, even at the lowest optimization levels. To quote the docs "Most optimizations are completely disabled at -O0", so it seems you can't assume you can force correct behavior just by turning off optimization passes.

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html


Part of the difficulty here is working out which transformations are specifically "optimisations". Some compiler passes are required for correct (or indeed any) code generation -- for example, in the compiler I was employed to work with, the instruction selector was a key pass for generating optimal code, but we only had one: if you "turned off optimisations" then we'd run the same instruction selector, merely on less-optimal input. So we'd disable all the passes that weren't required for correctness or completeness, but we'd not write deliberately non-optimal equivalents for the passes that were required.

Beyond that, you've got a contradiction in your statement -- you can't "force correct behaviour" from a compiler at any point. The compiler always tries to generate correct behaviour according to what the code actually says. If you lie to the compiler, it'll try its best to believe you.

C compilers are intended to accept every correct C program. But they can only do this by also accepting a wide range of incorrect C programs -- if we can prove that the program can't be correct then we can reject it, otherwise we have to trust the programmer. Contrast this with Rust, where the intent is to reject every incorrect Rust program. Again, not every program can be clearly judged correct or incorrect, but in this case we'll err on the side of not trusting the programmer. Of course, "unsafe" in Rust and various warnings that can be enabled in C mean you can tell the Rust compiler to trust the programmer and tell the C compiler to disallow a subset of possibly-correct but unprovable programs, but the general intent still stands.

So if you want to write in a language that's like C but with "correct behaviour" then ultimately you'll have to procure yourself a compiler to do that. Because the authors of the various C compilers try very hard to have correct behaviour, and just because you want to be able to get away with lying to their compilers doesn't magically make them wrong.


Always missing in this argument is the logic of how to go from "undefined" to "can never happen". If the spec did not want it to happen they would have said "can not happen/illegal". but no, it is undefined by the spec. The spec knows that it can and will happen they just did not want to pin down the behavior of the compiler. So the compiler optimization team saying "we can assume this will never happen" is a blind almost maliciously complaint viewpoint.


From the C standard §3.4.3:

    undefined behavior
    
    behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
    for which this International Standard imposes no requirements
    NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable
    results, to behaving during translation or program execution in a documented manner characteristic of the
    environment (with or without the issuance of a diagnostic message), to terminating a translation or
    execution (with the issuance of a diagnostic message).
    EXAMPLE An example of undefined behavior is the behavior on integer overflow
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

The important wording here is "this International Standard imposes no requirements"

i.e. an implementation is allowed to do literally anything in the case of undefined behaviour. It's not quite that the compiler writers are saying "this can never happen", it's more along the lines of "if this does happen, we can do anything at all, including acting as if the conditions were such that UB could not have happened."

So if you multiply two signed ints that the compiler knows are positive, the compiler can assume that the result can't overflow. Because, if it does overflow, the compiler can emit code that does absolutely anything in that case - including acting as if it didn't overflow. Therefore, it can elide checks for a negative result, because either there was no overflow in which case the check is redundant, or there was but the code is allowed to do anything at all - including not performing the check for a negative result.


I agree with this logic. But note there is one caveat: Observable behavior that has already happened when the condition for UB is encountered can then not be affected. The C++ committee later clarified that for UB also the previous observable behavior can be affected. It is unclear whether this affects C as the C committee never added this clarification, but compiler writers often apply this interpretation to C as well. In my opinion the C++ made a mistake here, as this makes UB more dangerous.


When do you consider that "the condition for UB" would be encountered, for any particular behaviour that lacks a definition? The optimiser is -- in general -- allowed to re-order operations and restructure code if it doesn't change the behaviour of the program. In doing this, it needs to trust the programmer that the program is valid. Otherwise it can't even re-order two signed additions, lest the first one overflow.

You might want a language that restricts this, but C is not that language. Or you might want a language that defines all its behaviours, but C is not that language either. Compiler writers put a lot of effort into making their compilers do exactly what the programmer tells them to do.

My personal take is that the correct response to the difficulties of ensuring your C program doesn't exhibit undefined behaviour is probably to avoid writing new code in C. But if you do still need to write C for whatever reason (which I do, occasionally) then it's only sensible to take as much care as the language design expects programmers to take: the compiler trusts the programmer to only attempt operations with defined results.


The condition is always stated in the C standard. "if ... the behavior is undefined". An optimizer is not in general allowed to re-order operations. It is allowed to do this only if it can prove that there is no change in observable behavior.


Indeed, but "no change in observable behaviour" -- along with every other suggestion of correctness from a C compiler -- is only guaranteed in the presence of a well-defined program.

Honestly, I think we'd all be better served by pushing the concept of "undefined behaviour" a bit further into the background. C has defined behaviours, and the standard helpfully makes explicit which behaviours fall outside the definitions. If you want a defined output then your program had better have a defined behaviour when presented with your input.

I'm not suggesting this is ideal -- far from it, I avoid writing new C code. But it's what C does. If you want to avoid needing to make sure that your program only attempts defined operations, switch to a language that doesn't impose that requirement.


This assumes that my computer isn't allowed to be a time machine. I don't see that in the spec anywhere.


Every technical text needs to be read using some common sense. Once you give this away, you can justify everything.


Sure, but the common sense I (and I think I can safely say the compiler writers) are applying is "when the spec says 'the program might do anything', then there is no meaningful difference to the user whether or not we guarantee that everything up to that point was executed correctly". Who cares whether we transferred money from account A to account B when the program is then going to transfer 5 times as much from account B to account A and gift our competitor half of our money while it's at it.

I'm not sure if I agree with your interpretation of the spec, but even if that's the technically correct interpretation, arguing that things went wrong because the compiler miscompiled the program and that it didn't do the things it was supposed to before it was allowed to do literally anything... just isn't an interesting argument. Things went wrong because your program was wrong.


The spec says there are no restriction on the behavior. But now going on saying that when there are no restriction on behavior the term "behavior" now includes impossible things like time travel or magic instead of something any actual machine could possible do, this seems far fetched to me.


Regarding the second point. Sure the program went wrong because it was wrong. But the damage it can do when something went wrong when this can affect previous behavior is much higher. Being able to prove partial correctness of a program is a useful feature (e.g. when a transaction completed correctly you can be sure that and error in the logging function afterwards does not undo this).


> Always missing in this argument is the logic of how to go from "undefined" to "can never happen".

The reasoning is something like "if we assume UB doesn't happen, but it does happen, the resulting behavior is unpredictable. This is allowed by the standard, though, because UB allows for any behavior, including that produced by assuming UB doesn't happen."

In other words, major implementations treat UB as preconditions. Violating those preconditions gets you Interesting Results (TM), but that's allowed by the standard because "unpredictable results " really means unpredictable results.

For example, null pointer dereference is UB. If an implementation assumes null pointers can never be dereferenced, it can better optimize some code. If it turns out a null pointer is dereferenced, the argument is that whatever happens then is still permitted by the standard as the standard does not define any program semantics for programs containing UB.


I agree that this does not follow from the wording in the standard and I am relatively sure that this was originally not implied. But this view point is repeated quite often nowadays. I think this is because prominent compiler developers promoted this point of view and used this for blaming the user ("Because you have UB in your program it is completely invalid. It is now ok that the compiler breaks it, and it is alone your fault".) The other response to your post is correct so, but this explanation would not allow UB to affect prior observable behavior.


Ex falso quodlibet


...is taken as true but is a really lousy principle for modelling informal reasoning.


Compilers and compiler writers don't rely on informal reasoning when deciding whether an optimization is valid


This can be modelled formally: just drop the ex falso quodlibet axiom and its equivalents.


> dereferencing an invalid pointer is undefined behaviour, which means the compiler is allowed to assume it never happens;

Sure, but it didn't have to be like this. They could have said it is unspecified wihout allowing the compiler to assume it doesn't happen.

Would C have been better if the spec was different?


That's not what's happening. Because it's unspecified transforms that are safe in the absence of that behavior are safe to apply since they preserve the semantics.

Even something like register allocation requires knowledge of what pointers point to.


> is undefined behaviour, which means the compiler is allowed to assume it never happens

No it's not. Or let me rephrase that. The standard says the following:

Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

Which of these is "the compiler is allowed to assume it does not happen"?


> Which of these is "the compiler is allowed to assume it does not happen"?

I agree with where you're coming from, but "ignoring the situation completely with unpredictable results" sounds like it pretty much fits the bill. Pretending something doesn't happen sounds a lot like ignoring it completely to me.

What the standard obviously does exclude is nonsense like intentionally reformatting your drives.


> Pretending something doesn't happen sounds a lot like ignoring it completely to me.

Quite the opposite. Assuming it doesn't happen (not "pretending"), is very much not ignoring the situation, at least if you then act on that assumption that it does not happen.

Ignoring it just lets it happen when it does, so if the program specifies an out-of-bounds access, the compiler generates code for an out-of-bounds access, ignoring the fact that it is an out of bounds access.


> Assuming it doesn't happen (not "pretending"), is very much not ignoring the situation, at least if you then act on that assumption that it does not happen.

I'm not sure how assuming UB doesn't happen is distinguishable from a choosing to ignore the situation every time one comes up. You get the same result either way.

For example, a compiler can assume that null pointers are never dereferenced, or every time a null pointer is/may be dereferenced it can just "ignore the situation" with the dereference. I'm not seeing a functional difference here.

This is, of course, subject to the minor problem that "situation" is arguably underspecified. Compiler writers appear to interpret it as something akin to "code path" (so "ignoring the situation" means "ignoring code paths invoking UB"), while UB-goes-too-far proponents appear to interpret it more broadly, more like "the fact that UB can/will happen" (so "ignoring the situation" means "ignore the fact UB will/may happen").

> Ignoring it just lets it happen when it does, so if the program specifies an out-of-bounds access, the compiler generates code for an out-of-bounds access, ignoring the fact that it is an out of bounds access.

Why wouldn't this fall under "behaving during translation or program execution in a documented manner characteristic of the environment" instead?


I think it's quite clear how "assuming UB doesn't happen is distinguishable from a choosing to ignore the situation."

    strcpy(P, filename);
    free(P);
    if (P[0] == '.') {
        // hidden file
        // do something
    }
Obviously you shouldn't use the above code. However, for illustrative purposes, ignoring the situation [of undefined behavior] probably still results in doing something for hidden files. What people are complaining about is compilers finding that there's UB by static analysis and optimizing out the conditional entirely because they assume dereferencing the pointer to freed memory "doesn't happen."

There are good arguments for both sides, in my opinion. But let's not pretend they're the same thing. Deleting logic because it provably would result in UB is not the same as ignoring the UB.


I'm still not seeing the distinction between "ignoring the situation" and "assuming no UB". Under compiler writers' interpretation, I think there would be two situations in your snippet:

1. This code path is executed. UB will be invoked.

2. This code path is not executed. No UB occurs.

If the compiler "ignores the situation" with UB, code path 1 is ignored (i.e., dropped from consideration). This probably results in the removal of the snippet as dead code.

If the compiler assumes no UB occurs, code path 1 is eliminated as cannot-happen, and the compiler probably deletes the snippet as dead code.

Same result either way (with the obvious caveat that this is one possible interpretation of "ignoring the situation").

> ignoring the situation [of undefined behavior] probably still results in doing something for hidden files.

The problem is that this assumes a very specific definition of "ignoring the situation" which, while understandable, isn't the only interpretation permitted by the Standard.

In addition, there's the fact that such an interpretation would arguably fall under "behaving during translation or program execution in a documented manner characteristic of the environment" instead.

> Deleting logic because it provably would result in UB is not the same as ignoring the UB.

True, but "ignoring the UB" isn't what the Standard says. It says "ignoring the situation", and that's the problem - people can't agree on what "ignoring the situation" is supposed to mean. Compiler writers appear to take it to mean "ignore UB-invoking code paths", UB-goes-too-far proponents take it to mean "ignore the presence of UB".


> If the compiler "ignores the situation" with UB, code path 1 is ignored (i.e., dropped from consideration). This probably results in the removal of the snippet as dead code.

Ignoring something != killing something. There is no dialect of English in which ignoring something is compatible with eliminating it from existence.

EDIT: Note: I am not personally saying compiler writers are wrong. The body text says it imposes no requirements, so at the very least it's a reasonable interpretation to say the footnote text isn't binding and/or that "possible" behaviors are examples, not a full enumeration. But! On the narrow question of "ignoring" the behavior altogether... Hunting for UB via static analysis and then changing your output based on whether you find it simply is not what the word "ignoring" means.


The compiler is not removing or killing invalid code, you can still find it in the source file.

What is doing, in the extreme case, is ignoring it and not generating asm statements for it. Then again, how it could? Code that will trigger UB has no meaning so the compiler wouldn't know what code to generate.

Of course a compiler could assign meaning to some instance to UB.

For example I'm pretty sure that in GCC dereferencing a null pointer is not UB, but it is expected to trap (because POSIX) and the execution not to continue except via abnormal edges (exceptions or longjmp). This means that any code that can be proven to be reachable only through a nullpter dereference is effectively dead code, so in practice it can still introduce bugs if it didn't trap.


At least as far as one is unable to distinguish between the two, sure it is. A compiler that emits code as if UB-containing code paths are not there is essentially performing the dictionary definition of ignoring something, but it's functionally indistinguishable from a compiler that deletes UB-containing code paths.


No.

What you are describing is ignoring the code that has the undefined behaviour due to it having undefined behaviour.

That is not ignoring the undefined behaviour, it is the opposite.


Maybe it's not "ignoring the undefined behaviour", but why isn't it ignoring the situation? The situation is that this code (path) invokes UB. Ignoring "the situation" seems to allow simply not considering that path. Maybe that results in that path not being emitted.

Again, it all comes down to interpreting "the situation". Compiler writers construe it broadly; "Ignore the presence of UB" (i.e., construing "the situation" narrowly) is another possible interpretation, but I don't think it's the one and only definitive one.

In addition, why isn't "ignore the presence of UB" covered by "behaving during translation or program execution in a documented manner characteristic of the environment"? See a null pointer dereference? Just do the "characteristic thing" during translation and emit the dereference. Maybe implementations will need to add documentation somewhere, but that's not exactly the challenging part.


Because it is not ignoring the undefined behaviour. Simple as that.

What you are confusing is "being agnostic about something happening or not happening" and "assuming it cannot happen".

And sorry, the "situation" is pretty precisely scoped by "Permissible undefined behavior...". So what can be ignored is this instance of UB, not the fact that UB exists.

Otherwise, if you're going to arbitrarily expand the scope of what the situation is, then how about "the fact that a C spec exist?". That's a situation, after all, and it is the situation you are in.

Or maybe just ignore parts of the spec, like the ones that define what is UB and what is not UB.

Then everything becomes a trigger, and if I can expand scope like that, then I have a standards-compliant C-compiler for you:

    int main() {  int a = *-1; } 
(I am pretty sure you can make a smaller one)

I doubt anyone would accept this broadening of the scope.

Once again, ignoring something is not generally the same as assuming it doesn't exist. They could be the same if you assume it doesn't exist and then do nothing differently. However, if you use your assumption that it doesn't exist and act differently based on that assumption than if it did exist, then you are not ignoring the situation.

And the latter is clearly what is happening with today's optimising C compilers. They act very differently in the presence of UB than they would if the UB would not be there, for example not translating code that they would have translated had the UB not been there, had it not been UB or had they actually ignored the UB as the should have.

> just do the "characteristic thing" during translation and emit the [null pointer] dereference

These things overlap slightly, but I doubt that "just emitting the dereference" qualifies as a documented exception to normal processing of a pointer dereference due to UB. It is exactly the same thing it does when the pointer dereference is UB, so it is just ignoring the UB.

Another misinterpretation that seems to be common is to interpret "the environment" in "characteristic of the environment" to include the (optimising) compiler itself.


> Because it is not ignoring the undefined behaviour.

But it is ignoring the situation? At least, given the broader interpretation of "the situation". I understand there's a narrow interpretation as well.

> And sorry, the "situation" is pretty precisely scoped by "Permissible undefined behavior...".

Maybe? I think I understand the argument. Will need to think on it some more...

> So what can be ignored is this instance of UB, not the fact that UB exists.

I think I haven't been clear enough on this - I had been using "ignore the fact that UB exists" to essentially mean "ignore this instance of UB" - i.e., carry on as if there was no UB. I had been using "ignore code paths with UB" for the broader modern-compiler-style interpretation.

> Otherwise, if you're going to arbitrarily expand the scope of what the situation is, then how about "the fact that a C spec exist?". That's a situation, after all, and it is the situation you are in.

Sure, it's a situation, but I don't think anyone is exactly advocating for an arbitrary expansion of the scope of a situation. "The fact that a C spec exists" is a situation, but it doesn't even pretend to have anything to do with the Standard's permissible UB.

> Once again, ignoring something is not generally the same as assuming it doesn't exist. They could be the same if you assume it doesn't exist and then do nothing differently. However, if you use your assumption that it doesn't exist and act differently based on that assumption than if it did exist, then you are not ignoring the situation.

I'd agree that proceeding without considering UB at all would count as ignoring something.

However, I'd argue that that's not the only way to read "ignore" - dropping something from consideration, to me, certainly sounds like ignoring something. You had to choose to do so, but that doesn't make it not ignoring it. That also depends on framing, though - back to how broadly "the situation" should be read.

> I doubt that "just emitting the dereference" qualifies as a documented exception to normal processing of a pointer dereference due to UB. It is exactly the same thing it does when the pointer dereference is UB, so it is just ignoring the UB.

Sorry, I don't quite understand what you're trying to say with the first sentence - where did the concept of an exception to normal processing come from? The idea was that emitting a dereference is the characteristic translation behavior, so that phrase in the Standard would cover "ignoring the UB" and doing what may otherwise be expected.

> Another misinterpretation that seems to be common is to interpret "the environment" in "characteristic of the environment" to include the (optimising) compiler itself.

I had interpreted "the environment" as including semantics; i.e., the translation environment includes these rules for translation/program semantics, so a characteristic behavior could be "normal" semantics. This interpretation doesn't need to include the compiler since the characteristic behavior is derived from the environment, not the compiler.

Looking more closely at the Standard, though, I'm not too confident in this interpretation. Perhaps "behaving during [] program execution in a documented manner characteristic of the environment" could work, though it's admittedly not what I originally had in mind, and I'm still not sure it works.

----

I do have to admit, though, that after the discussions I've had with you I'm less confident about my understanding of this. It'd be nice to talk to an actual major compiler dev or some C89 committee members about this. Feel like I had run across such a thing at some point, but I don't remember where or when.


To respond to your edit:

> Hunting for UB via static analysis and then changing your output based on whether you find it simply is not what the word "ignoring" means.

Why not? A static analysis pass can flag a code path as containing UB, and future passes can then ignore that path. Sure sounds like "ignoring" to me.


Ignoring Y because it has X is the opposite of ignoring X.

It is fundamentally impossible to both ignore X and make decisions based on X at the same time.

Something cannot be both ignored and a key criterion.

Your argument is akin to saying that a company hiring process that "ignores race" means eliminating applicants based on their race. It is untrue. It is the opposite of true. And I think you know that, so please stop trolling.


> It is fundamentally impossible to both ignore X and make decisions based on X at the same time.

Of course you can - the former can be the action you take as a result of the decision. Choosing to not consider something is distinct from refusing to make a decision based on that something, but both are "ignoring" that something.

Again, this boils down to how "ignoring the situation" is interpreted. I can ignore situations with UB and proceed as if those situations aren't present, or I can ignore situations with UB and proceed as if the UB weren't present. The Standard's wording does not rule out one or the other.

In addition, why would the "ignore the existence of UB" not fall under "behaving during translation or program execution in a documented manner characteristic of the environment"? That seems to match "pretend the UB were not present" much more closely.

> Your argument is akin to saying that a company hiring process that "ignores race" means eliminating applicants based on their race.

No, it means that the hiring process makes decisions without considering what race-based effects that may have. If that happens to result in weird race-based outcomes, then that's what happens.


The paragraph you quote is a Note, which is non-normative.

The normative text that specifies the semantics of undefined behavior is:

behavior [...] for which this International Standard imposes no requirements.


1. It still exists. Pretending it doesn’t exist is disingenuous at best.

2. It used to be normative.


> [whatever] is undefined behaviour, which means the compiler is allowed to assume it never happens;

That interpretation is the root of the problem. Compilers authors use it to implement outright user hostile behavior in the name of elusive performance.

Why would you spend resources looking for zero days when you can have a few LLVM contributors plant them in every program "as an optimization"?


So why not use a different compiler?


Would you write it?

You can use the same compiler with a language whose design committee isn't deliberately user hostile, like Rust (where UB-like behaviors in safe code are considered soundness bugs).

https://runrust.miraheze.org/wiki/Undefined_Behavior


Rust makes guarantees for safe code that undefined behavior would violate. C(++) has no such mode and as such a comparison cannot be made.


< you are telling the compiler that your arithmetic is guaranteed to never overflow.

Any time your are dealing with data from real, physical sensors or third-party APIs, this is an impossible guarantee to give - they could literslly break.


of course it is possible. One must validate the input before performing arithmetic on signed integers.


The compiler is allowed to assume that the data will never overflow so it is also allowed to get rid of overflow checks. Now imagine writing a complex validation routine that if it were violated would result in undefined behavior for every value that the validation rejects, a sufficiently smart compiler is allowed to simply remove your validation code, leaving you with no validation whatsoever.


It can get rid of checks that test whether the results of a previous operation have overflowed. It can't eliminate checks that test whether a subsequent operation will overflow and abort because that would be changing semantics.

C is an absolute minefield of undefined behavior, but let's be accurate about the things it does wrong.


There are several ways to do overflow checks safely (i.e. without undefined behavior), though the ergonomics are not always ideal.

C23 somewhat improves the situation with <stdckdint.h>, a standardized version of GCC’s __builtin_add_overflow and friends. That has ergonomics issues too due to its verbosity, but at least it’s hard to screw up.


The compiler is allowed to assume that if x and y are signed ints:

   if (x < 0 || y < 0) return -1;
   return x * y;
Will not overflow. And if you try to check for overflow with:

   if (x < 0 || y < 0) return -1;
   if (x * y < 0) return -2;
   return x*y;
Then yes, the compiler is within spec to remove your check because the only situation in which you could hit that check would be after signed integer overflow, which it is allowed to assume won't happen.

One way to implement this check in GCC where the compiler will respect it would be:

   if (x < 0 || y < 0) return -1;
   int z;
   if (__builtin_smul_overflow(x, y, &z)) return -2;
   return z;


That doesn’t sound like sound logic to me. The compiler assumes that at the point you have the arithmetics, they won’t overflow. This hinges upon all the former state of the program. If it can prove that the given integers won’t overflow (e.g. due to a previous, redundant check) then it can indeed remove a conditional, but the compiler can’t change the observable behavior of the program.


It can't (correctly) remove checks that would have prevented the undefined behavior, because that is changing the semantics of a program that (as written) does not trigger undefined behavior.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: