Skip to content

[Enhancement] Save compilation time from cute templates #272

@LeiWang1999

Description

@LeiWang1999

NVIDIA's CUDA Toolkit 12.8 recently published a blog post about optimizing compile times with nvcc: [Optimizing Compile Times for CUDA C++](https://developer.nvidia.com/blog/optimizing-compile-times-for-cuda-c).

We are encountering a similar issue by utilizing cute as our backend, which can lead to significant template compilation overhead. We might benefit from adopting their approach to reduce our compilation times through performance trace optimizations.

To reproduce the compilation time:

import time
import subprocess
import os
start = time.time()


os.system("nvcc -std=c++17 -w -Xcudafe --diag_suppress=177 --compiler-options '-fPIC' -lineinfo --shared /tmp/tmp5hql14m2.cu -lcuda -gencode arch=compute_89,code=sm_89 -I/root/tilelang/tilelang/../src -I/root/tilelang/tilelang/../3rdparty/cutlass/include -diag-suppress=20013 -o /tmp/tmp5hql14m2.so")


end = time.time()
print(f"Time taken: {end - start} seconds")

Need some volunteers who have stable machine with cuda 12.8 installed to help :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions