[CUBLAS] Enable offloading of R.matmul + R.dequantize #16896

ibsidorenko · 2024-04-17T13:20:15Z

This commit enables offloading of R.matmul + R.dequantize to cuBLAS codegen. Dequantization scale is passed to runtime function and set to alpha parameter. If there is no dequantization, then alpha == 1.0.
Also, it can be used to fuse output scale to matmul in case of FP8.

cc @vinx13 @csullivan @JosephTheOctonaut @masahi @elvin-n

This commit enables offloading of R.matmul + R.dequantize to cuBLAS codegen. Dequantization scale is passed to runtime function and set to alpha parameter. If there is no dequantization, then alpha == 1.0.

ibsidorenko marked this pull request as draft April 17, 2024 13:20

[CUBLAS] Enable offloading of R.matmul + R.dequantize

40d6d90

This commit enables offloading of R.matmul + R.dequantize to cuBLAS codegen. Dequantization scale is passed to runtime function and set to alpha parameter. If there is no dequantization, then alpha == 1.0.

ibsidorenko force-pushed the cublas-matmul-dequantize branch from df3b24f to 40d6d90 Compare May 3, 2024 08:23

ibsidorenko marked this pull request as ready for review May 3, 2024 11:50

github-actions bot requested review from csullivan, masahi and vinx13 May 3, 2024 13:12

masahi approved these changes May 3, 2024

View reviewed changes

masahi merged commit effa5d7 into apache:main May 3, 2024

ibsidorenko deleted the cublas-matmul-dequantize branch May 6, 2024 13:27

ysh329 mentioned this pull request Jul 20, 2024

[Release] v0.17.0 Release Candidate Notes #17178

Closed

kurisu6912 mentioned this pull request Sep 5, 2025

kurisu add assume attr patch 1 tile-ai/tvm#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUBLAS] Enable offloading of R.matmul + R.dequantize #16896

[CUBLAS] Enable offloading of R.matmul + R.dequantize #16896

Uh oh!

ibsidorenko commented Apr 17, 2024 •

edited

Loading

Uh oh!

Uh oh!

[CUBLAS] Enable offloading of R.matmul + R.dequantize #16896

[CUBLAS] Enable offloading of R.matmul + R.dequantize #16896

Uh oh!

Conversation

ibsidorenko commented Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ibsidorenko commented Apr 17, 2024 •

edited

Loading