-
Notifications
You must be signed in to change notification settings - Fork 230
Closed
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
NVIDIA's CUDA Toolkit 12.8 recently published a blog post about optimizing compile times with nvcc: [Optimizing Compile Times for CUDA C++](https://developer.nvidia.com/blog/optimizing-compile-times-for-cuda-c).
We are encountering a similar issue by utilizing cute as our backend, which can lead to significant template compilation overhead. We might benefit from adopting their approach to reduce our compilation times through performance trace optimizations.
To reproduce the compilation time:
import time
import subprocess
import os
start = time.time()
os.system("nvcc -std=c++17 -w -Xcudafe --diag_suppress=177 --compiler-options '-fPIC' -lineinfo --shared /tmp/tmp5hql14m2.cu -lcuda -gencode arch=compute_89,code=sm_89 -I/root/tilelang/tilelang/../src -I/root/tilelang/tilelang/../3rdparty/cutlass/include -diag-suppress=20013 -o /tmp/tmp5hql14m2.so")
end = time.time()
print(f"Time taken: {end - start} seconds")
Need some volunteers who have stable machine with cuda 12.8 installed to help :)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed