-
Notifications
You must be signed in to change notification settings - Fork 316
Open
Description
Hi folks, thanks for the great work.
With #135 merged, vLLM could see benefit from torch.compile backend given compiler-native integration with PagedAttention kernels.
Is there an easy way to see what the latest/nightly MBU is for torch compile on say, H100 / Llama3 70B?
Also interested in cold start compile time
cc @msaroufim
Activity
supriyar commentedon May 10, 2024
@anijain2305 do we have any benchmark numbers for the cold start compile time?
msaroufim commentedon May 11, 2024
Related pytorch/pytorch#125958
remove redundancy & remove int4 linear test from ET tests (pytorch#237)
Gguf cleanup (pytorch#230)