Skip to content

clang++ cuda: Lambda capture fails to initialize memory when variable only used in #ifdef __CUDA_ARCH__ #193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rgreenblatt opened this issue Mar 20, 2020 · 3 comments

Comments

@rgreenblatt
Copy link

rgreenblatt commented Mar 20, 2020

template <typename F> __global__ void call(F f) { f(); }

int main(int argc, char *argv[]) {
  int a = 0;

  call<<<1, 1>>>([=] __device__() {
  // Uncomment line below for code to work
  // (void)a;

#ifdef __CUDA_ARCH__
    printf("%d\n", a);
#endif
  });

  cudaDeviceSynchronize();

  return 0;
}

Build with clang++ -std=c++17 --cuda-gpu-arch=sm_75 -L/usr/local/cuda/lib64 -lcudart test.cu changing options as needed. The issue also occurs on c++11/14. The value printed is garbage instead of 0. Using the variable outside the ifdef or explicity capturing makes the issue go away. The issue occurs regardless of optimization settings as far as I can tell.

My testing is on trunk, but I think this still occurs with clang 9/10

@zygoloid
Copy link
Collaborator

zygoloid commented Mar 20, 2020

On the host side, the lambda does not capture a, so the lambda has a different representation between host and device. This is a bug in your code, just as if you had

struct Lambda {
#ifdef __CUDA_ARCH__
  int a;
#endif
};

and shared the type Lambda between host and device.

Workaround: either don't use #ifdef __CUDA_ARCH__ (at all, it can lead to lots of problems), or in this specific case, don't use implicit lambda captures ([=]) and use an explicit [a] capture instead.

@rgreenblatt
Copy link
Author

Thanks for the help. Didn't really think this through.

@Endilll Endilll changed the title clang++ cuda: Lambda capture fails to initialize memory when variable only used in ifdef __CUDA_ARCH__ clang++ cuda: Lambda capture fails to initialize memory when variable only used in #ifdef __CUDA_ARCH__ Jan 20, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 20, 2024

@llvm/issue-subscribers-c-11

Author: Ryan Greenblatt (rgreenblatt)

```cuda template <typename F> __global__ void call(F f) { f(); }

int main(int argc, char *argv[]) {
int a = 0;

call<<<1, 1>>>([=] device() {
// Uncomment line below for code to work
// (void)a;

#ifdef CUDA_ARCH
printf("%d\n", a);
#endif
});

cudaDeviceSynchronize();

return 0;
}


Build with `clang++ -std=c++17 --cuda-gpu-arch=sm_75 -L/usr/local/cuda/lib64 -lcudart test.cu` changing options as needed. The issue also occurs on c++11/14. The value printed is garbage instead of 0. Using the variable outside the ifdef or explicity capturing makes the issue go away. The issue occurs regardless of optimization settings as far as I can tell.

My testing is on trunk, but I think this still occurs with clang 9/10
</details>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants