Skip to content

[clang] Function with empty loop has no return, has a duplicate address #60588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SvizelPritula opened this issue Feb 7, 2023 · 9 comments
Closed
Labels
duplicate Resolved as duplicate llvm:optimizations question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@SvizelPritula
Copy link

SvizelPritula commented Feb 7, 2023

Consider the following code sample:

#include <stdlib.h>
#include <stdio.h>

void a()
{
    for (unsigned int i = 0; i >= 0; i++)
        ;
}

void b()
{
    puts("b");
}

Function a contains an infinite loop without a constant controlling expression and without side-effects. The C11 and C17 standards allow implementations to assume that such a loop terminates. This code also happens to be valid C++ code. In C++11 and later, side-effect free infinite loops may also be assumed to terminate, as any thread can be assumed to eventually terminate or perform some kind of side-effect. The b function obviously has no strange or undefined behaviour.

When the function a is compiled with either clang or clang++ with at least -O1 for x86-64, the result will be an instructionless function:

a:                                      # @a
b:                                      # @b
        lea     rdi, [rip + .L.str]
        jmp     puts@PLT                        # TAILCALL
.L.str:
        .asciz  "b"

This means calling a will have the same result as calling b. It's unclear whether this is allowed by either standard, as both just say the loop "may be assumed to terminate", not specifically that an infinite loop has undefined behaviour. One could say this only allows a compiler to remove a loop it cannot prove to terminate, not to assume a loop it can prove not to terminate will never be called. (Edit: It's well understood to be undefined behaviour.)

Either way, this has another effect: a and b now have the same address.

We can append a small main function to the above snippet:

typedef void (*fptr)();

fptr identity(fptr f)
{
    volatile fptr f2 = f;
    return f2;
}

int main()
{
    printf("%d %d\n", a == b, identity(a) == identity(b));
}

The identity function just returns its argument in a way clang cannot inline. When we compile (with -O1 or higher) and run the full program, we get 0 1, which means a and b are both identical and different. This seems to be a violation of the spec, since:

a) different functions shouldn't be equal, and
b) two different values shouldn't suddenly become equal after passing them through an identity function.

This was tried with clang and clang++ version 14.0.0-1ubuntu1, and it can be observed to happen in version 15.0.0 through the Compiler Explorer.

@EugeneZelenko EugeneZelenko added question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! llvm:optimizations and removed new issue labels Feb 7, 2023
@shafik
Copy link
Collaborator

shafik commented Feb 7, 2023

This is undefined behavior, note 1 right under the forward progress paragraph says:

[Note 1: This is intended to allow compiler transformations such as removal of empty loops, even when termination cannot be proven. — end note]

We can also read N1528 which justifies why this is considered undefined behavior.

@SvizelPritula
Copy link
Author

That's fair, I was unsure if it's undefined behaviour. Either way, two functions shouldn't have the same address, whether or not calling them is undefined behaviour.

@shafik
Copy link
Collaborator

shafik commented Feb 7, 2023

Once you have undefined behavior then you no longer have any expectations as to the results. The optimizations are going to assume well-formed programs and won't have the expected results if that is not the case.

@SvizelPritula
Copy link
Author

But the provided code snippet doesn't contain undefined behaviour. Undefined behaviour would only occur if the program actually encountered an infinite loop, which it doesn't, sice a is never called.

@shafik
Copy link
Collaborator

shafik commented Feb 7, 2023

It looks like this is a duplicate of #48943 (comment) even though the undefined behavior is different.

@dwblaikie do you still plan on adding a ud2 for unreachable blocks? This sounds like a step in the right direction.

@shafik shafik closed this as completed Feb 7, 2023
@shafik shafik added the duplicate Resolved as duplicate label Feb 7, 2023
@SvizelPritula
Copy link
Author

SvizelPritula commented Feb 7, 2023

You're right, that's a very similar issue. The root issue is the same, but as I pointed out, this technically causes a violation of the specification, while that issue deals mostly with ergonomics. Should I move my comments to that issue?

@EugeneZelenko EugeneZelenko closed this as not planned Won't fix, can't repro, duplicate, stale Feb 7, 2023
@SvizelPritula
Copy link
Author

I hope this won't be considered spam, but i submitted a clarified version as #60596 that removes mentions of infinite loops and focuses on the strange comparison.

@dwblaikie
Copy link
Collaborator

@dwblaikie do you still plan on adding a ud2 for unreachable blocks? This sounds like a step in the right direction.

Never did figure out a nice way to do it - probably should just settle for the OK ways to do it. I think there's already a couple of different "OK" tools that LLVM has that are turned on on some targets - there's TrapUnreachable, which is probably too pessimistic (traps on every unreachable? But maybe only on those that survive to codegen - so maybe not so bad?)

& then there's some other feature I think we enable on Windows that puts a trap or nop on zero length functions, I think?

I hope this won't be considered spam, but i submitted a clarified version as #60596 that removes mentions of infinite loops and focuses on the strange comparison.

Yeah, no worries - don't mind if it's this bug or the new one you filed.

@kparzysz-quic
Copy link

Undefined behaviour would only occur if the program actually encountered an infinite loop, which it doesn't, sice a is never called.

That's actually not true. The problematic code doesn't need to be executed for UB to take effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate Resolved as duplicate llvm:optimizations question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

No branches or pull requests

5 participants