Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Which of these is "the compiler is allowed to assume it does not happen"?

I agree with where you're coming from, but "ignoring the situation completely with unpredictable results" sounds like it pretty much fits the bill. Pretending something doesn't happen sounds a lot like ignoring it completely to me.

What the standard obviously does exclude is nonsense like intentionally reformatting your drives.



> Pretending something doesn't happen sounds a lot like ignoring it completely to me.

Quite the opposite. Assuming it doesn't happen (not "pretending"), is very much not ignoring the situation, at least if you then act on that assumption that it does not happen.

Ignoring it just lets it happen when it does, so if the program specifies an out-of-bounds access, the compiler generates code for an out-of-bounds access, ignoring the fact that it is an out of bounds access.


> Assuming it doesn't happen (not "pretending"), is very much not ignoring the situation, at least if you then act on that assumption that it does not happen.

I'm not sure how assuming UB doesn't happen is distinguishable from a choosing to ignore the situation every time one comes up. You get the same result either way.

For example, a compiler can assume that null pointers are never dereferenced, or every time a null pointer is/may be dereferenced it can just "ignore the situation" with the dereference. I'm not seeing a functional difference here.

This is, of course, subject to the minor problem that "situation" is arguably underspecified. Compiler writers appear to interpret it as something akin to "code path" (so "ignoring the situation" means "ignoring code paths invoking UB"), while UB-goes-too-far proponents appear to interpret it more broadly, more like "the fact that UB can/will happen" (so "ignoring the situation" means "ignore the fact UB will/may happen").

> Ignoring it just lets it happen when it does, so if the program specifies an out-of-bounds access, the compiler generates code for an out-of-bounds access, ignoring the fact that it is an out of bounds access.

Why wouldn't this fall under "behaving during translation or program execution in a documented manner characteristic of the environment" instead?


I think it's quite clear how "assuming UB doesn't happen is distinguishable from a choosing to ignore the situation."

    strcpy(P, filename);
    free(P);
    if (P[0] == '.') {
        // hidden file
        // do something
    }
Obviously you shouldn't use the above code. However, for illustrative purposes, ignoring the situation [of undefined behavior] probably still results in doing something for hidden files. What people are complaining about is compilers finding that there's UB by static analysis and optimizing out the conditional entirely because they assume dereferencing the pointer to freed memory "doesn't happen."

There are good arguments for both sides, in my opinion. But let's not pretend they're the same thing. Deleting logic because it provably would result in UB is not the same as ignoring the UB.


I'm still not seeing the distinction between "ignoring the situation" and "assuming no UB". Under compiler writers' interpretation, I think there would be two situations in your snippet:

1. This code path is executed. UB will be invoked.

2. This code path is not executed. No UB occurs.

If the compiler "ignores the situation" with UB, code path 1 is ignored (i.e., dropped from consideration). This probably results in the removal of the snippet as dead code.

If the compiler assumes no UB occurs, code path 1 is eliminated as cannot-happen, and the compiler probably deletes the snippet as dead code.

Same result either way (with the obvious caveat that this is one possible interpretation of "ignoring the situation").

> ignoring the situation [of undefined behavior] probably still results in doing something for hidden files.

The problem is that this assumes a very specific definition of "ignoring the situation" which, while understandable, isn't the only interpretation permitted by the Standard.

In addition, there's the fact that such an interpretation would arguably fall under "behaving during translation or program execution in a documented manner characteristic of the environment" instead.

> Deleting logic because it provably would result in UB is not the same as ignoring the UB.

True, but "ignoring the UB" isn't what the Standard says. It says "ignoring the situation", and that's the problem - people can't agree on what "ignoring the situation" is supposed to mean. Compiler writers appear to take it to mean "ignore UB-invoking code paths", UB-goes-too-far proponents take it to mean "ignore the presence of UB".


> If the compiler "ignores the situation" with UB, code path 1 is ignored (i.e., dropped from consideration). This probably results in the removal of the snippet as dead code.

Ignoring something != killing something. There is no dialect of English in which ignoring something is compatible with eliminating it from existence.

EDIT: Note: I am not personally saying compiler writers are wrong. The body text says it imposes no requirements, so at the very least it's a reasonable interpretation to say the footnote text isn't binding and/or that "possible" behaviors are examples, not a full enumeration. But! On the narrow question of "ignoring" the behavior altogether... Hunting for UB via static analysis and then changing your output based on whether you find it simply is not what the word "ignoring" means.


The compiler is not removing or killing invalid code, you can still find it in the source file.

What is doing, in the extreme case, is ignoring it and not generating asm statements for it. Then again, how it could? Code that will trigger UB has no meaning so the compiler wouldn't know what code to generate.

Of course a compiler could assign meaning to some instance to UB.

For example I'm pretty sure that in GCC dereferencing a null pointer is not UB, but it is expected to trap (because POSIX) and the execution not to continue except via abnormal edges (exceptions or longjmp). This means that any code that can be proven to be reachable only through a nullpter dereference is effectively dead code, so in practice it can still introduce bugs if it didn't trap.


At least as far as one is unable to distinguish between the two, sure it is. A compiler that emits code as if UB-containing code paths are not there is essentially performing the dictionary definition of ignoring something, but it's functionally indistinguishable from a compiler that deletes UB-containing code paths.


No.

What you are describing is ignoring the code that has the undefined behaviour due to it having undefined behaviour.

That is not ignoring the undefined behaviour, it is the opposite.


Maybe it's not "ignoring the undefined behaviour", but why isn't it ignoring the situation? The situation is that this code (path) invokes UB. Ignoring "the situation" seems to allow simply not considering that path. Maybe that results in that path not being emitted.

Again, it all comes down to interpreting "the situation". Compiler writers construe it broadly; "Ignore the presence of UB" (i.e., construing "the situation" narrowly) is another possible interpretation, but I don't think it's the one and only definitive one.

In addition, why isn't "ignore the presence of UB" covered by "behaving during translation or program execution in a documented manner characteristic of the environment"? See a null pointer dereference? Just do the "characteristic thing" during translation and emit the dereference. Maybe implementations will need to add documentation somewhere, but that's not exactly the challenging part.


Because it is not ignoring the undefined behaviour. Simple as that.

What you are confusing is "being agnostic about something happening or not happening" and "assuming it cannot happen".

And sorry, the "situation" is pretty precisely scoped by "Permissible undefined behavior...". So what can be ignored is this instance of UB, not the fact that UB exists.

Otherwise, if you're going to arbitrarily expand the scope of what the situation is, then how about "the fact that a C spec exist?". That's a situation, after all, and it is the situation you are in.

Or maybe just ignore parts of the spec, like the ones that define what is UB and what is not UB.

Then everything becomes a trigger, and if I can expand scope like that, then I have a standards-compliant C-compiler for you:

    int main() {  int a = *-1; } 
(I am pretty sure you can make a smaller one)

I doubt anyone would accept this broadening of the scope.

Once again, ignoring something is not generally the same as assuming it doesn't exist. They could be the same if you assume it doesn't exist and then do nothing differently. However, if you use your assumption that it doesn't exist and act differently based on that assumption than if it did exist, then you are not ignoring the situation.

And the latter is clearly what is happening with today's optimising C compilers. They act very differently in the presence of UB than they would if the UB would not be there, for example not translating code that they would have translated had the UB not been there, had it not been UB or had they actually ignored the UB as the should have.

> just do the "characteristic thing" during translation and emit the [null pointer] dereference

These things overlap slightly, but I doubt that "just emitting the dereference" qualifies as a documented exception to normal processing of a pointer dereference due to UB. It is exactly the same thing it does when the pointer dereference is UB, so it is just ignoring the UB.

Another misinterpretation that seems to be common is to interpret "the environment" in "characteristic of the environment" to include the (optimising) compiler itself.


> Because it is not ignoring the undefined behaviour.

But it is ignoring the situation? At least, given the broader interpretation of "the situation". I understand there's a narrow interpretation as well.

> And sorry, the "situation" is pretty precisely scoped by "Permissible undefined behavior...".

Maybe? I think I understand the argument. Will need to think on it some more...

> So what can be ignored is this instance of UB, not the fact that UB exists.

I think I haven't been clear enough on this - I had been using "ignore the fact that UB exists" to essentially mean "ignore this instance of UB" - i.e., carry on as if there was no UB. I had been using "ignore code paths with UB" for the broader modern-compiler-style interpretation.

> Otherwise, if you're going to arbitrarily expand the scope of what the situation is, then how about "the fact that a C spec exist?". That's a situation, after all, and it is the situation you are in.

Sure, it's a situation, but I don't think anyone is exactly advocating for an arbitrary expansion of the scope of a situation. "The fact that a C spec exists" is a situation, but it doesn't even pretend to have anything to do with the Standard's permissible UB.

> Once again, ignoring something is not generally the same as assuming it doesn't exist. They could be the same if you assume it doesn't exist and then do nothing differently. However, if you use your assumption that it doesn't exist and act differently based on that assumption than if it did exist, then you are not ignoring the situation.

I'd agree that proceeding without considering UB at all would count as ignoring something.

However, I'd argue that that's not the only way to read "ignore" - dropping something from consideration, to me, certainly sounds like ignoring something. You had to choose to do so, but that doesn't make it not ignoring it. That also depends on framing, though - back to how broadly "the situation" should be read.

> I doubt that "just emitting the dereference" qualifies as a documented exception to normal processing of a pointer dereference due to UB. It is exactly the same thing it does when the pointer dereference is UB, so it is just ignoring the UB.

Sorry, I don't quite understand what you're trying to say with the first sentence - where did the concept of an exception to normal processing come from? The idea was that emitting a dereference is the characteristic translation behavior, so that phrase in the Standard would cover "ignoring the UB" and doing what may otherwise be expected.

> Another misinterpretation that seems to be common is to interpret "the environment" in "characteristic of the environment" to include the (optimising) compiler itself.

I had interpreted "the environment" as including semantics; i.e., the translation environment includes these rules for translation/program semantics, so a characteristic behavior could be "normal" semantics. This interpretation doesn't need to include the compiler since the characteristic behavior is derived from the environment, not the compiler.

Looking more closely at the Standard, though, I'm not too confident in this interpretation. Perhaps "behaving during [] program execution in a documented manner characteristic of the environment" could work, though it's admittedly not what I originally had in mind, and I'm still not sure it works.

----

I do have to admit, though, that after the discussions I've had with you I'm less confident about my understanding of this. It'd be nice to talk to an actual major compiler dev or some C89 committee members about this. Feel like I had run across such a thing at some point, but I don't remember where or when.


To respond to your edit:

> Hunting for UB via static analysis and then changing your output based on whether you find it simply is not what the word "ignoring" means.

Why not? A static analysis pass can flag a code path as containing UB, and future passes can then ignore that path. Sure sounds like "ignoring" to me.


Ignoring Y because it has X is the opposite of ignoring X.

It is fundamentally impossible to both ignore X and make decisions based on X at the same time.

Something cannot be both ignored and a key criterion.

Your argument is akin to saying that a company hiring process that "ignores race" means eliminating applicants based on their race. It is untrue. It is the opposite of true. And I think you know that, so please stop trolling.


> It is fundamentally impossible to both ignore X and make decisions based on X at the same time.

Of course you can - the former can be the action you take as a result of the decision. Choosing to not consider something is distinct from refusing to make a decision based on that something, but both are "ignoring" that something.

Again, this boils down to how "ignoring the situation" is interpreted. I can ignore situations with UB and proceed as if those situations aren't present, or I can ignore situations with UB and proceed as if the UB weren't present. The Standard's wording does not rule out one or the other.

In addition, why would the "ignore the existence of UB" not fall under "behaving during translation or program execution in a documented manner characteristic of the environment"? That seems to match "pretend the UB were not present" much more closely.

> Your argument is akin to saying that a company hiring process that "ignores race" means eliminating applicants based on their race.

No, it means that the hiring process makes decisions without considering what race-based effects that may have. If that happens to result in weird race-based outcomes, then that's what happens.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: