Skip to content

Conversation

ibsidorenko
Copy link
Contributor

@ibsidorenko ibsidorenko commented Apr 17, 2024

This commit enables offloading of R.matmul + R.dequantize to cuBLAS codegen. Dequantization scale is passed to runtime function and set to alpha parameter. If there is no dequantization, then alpha == 1.0.
Also, it can be used to fuse output scale to matmul in case of FP8.

cc @vinx13 @csullivan @JosephTheOctonaut @masahi @elvin-n

@ibsidorenko ibsidorenko marked this pull request as draft April 17, 2024 13:20
This commit enables offloading of R.matmul + R.dequantize to cuBLAS
codegen. Dequantization scale is passed to runtime function and set to
alpha parameter. If there is no dequantization, then alpha == 1.0.
@ibsidorenko ibsidorenko force-pushed the cublas-matmul-dequantize branch from df3b24f to 40d6d90 Compare May 3, 2024 08:23
@ibsidorenko ibsidorenko marked this pull request as ready for review May 3, 2024 11:50
@github-actions github-actions bot requested review from csullivan, masahi and vinx13 May 3, 2024 13:12
@masahi masahi merged commit effa5d7 into apache:main May 3, 2024
@ibsidorenko ibsidorenko deleted the cublas-matmul-dequantize branch May 6, 2024 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants