[Dtype] Low-precision Blackwell Datatype Support #18027

Kathryn-cat · 2025-05-31T00:32:26Z

This PR focuses on supporting FP4/FP8 data types introduced in Blackwell architectures (sm_100).

TVM nd array stores subbyte data types in compact format, thus two FP4 would be stored in 1 byte. The size calculator for array allocator is modified accordingly.

Subtype arithmetic

The type __nv_fp4_e2m1 from <cuda_fp4.h> is a tag type and does not support pointer arithmetic. Accordingly, the compiler does not support index operations on an array declared with __nv_fp4_e2m1 directly. If any index operations like arr[0] + arr[1] is desired, user should declare the array as vector type like __nv_fp4x2_e2m1.

For example, suppose user creates an array A of type __nv_fp4_e2m1 with values

[-1 2 0.5 -6 -6 -2 2 3 4 1 -3 4 -2 2...]

extern "C" __global__ void __launch_bounds__(32) add_kernel(__nv_fp4_e2m1* __restrict__ A, __nv_fp4_e2m1* __restrict__ C) {
  C[((int)threadIdx.x)] = (__nv_fp4_e2m1)(((half)A[((int)threadIdx.x)]) + ((half)B[((int)threadIdx.x)]));
}

Printing out values of A[0], A[1], ... will show

A[0]: 2.000000
A[1]: -6.000000
A[2]: -2.000000
A[3]: 3.000000

This is because __nv_fp4_e2m1 is only a tag type. When it advances pointer, it advance by 1-byte at a time, yielding the upper 4 bits in the packed memory buffer. As a result, we should avoid directly doing indexing on __nv_fp4_e2m1 for arithmetic operations.

If user passes in __nv_fp4_e2m1 nd array and perform indexing, we can convert it to __nv_fp4x2_e2m1 and recalculate the indices if possible, but this requires more careful handling in the lowering process.

Thus, the original corresponding test case in test_target_codegen_cuda_fp4.py is removed.

Kathryn-cat · 2025-05-31T01:24:29Z

Thanks @DerrickYLJ for coauthoring it! Would you like to add yourself as coauthor in the PR?

include/tvm/runtime/data_type.h

src/runtime/device_api.cc

tqchen · 2025-06-02T12:29:57Z

cc @Hzfengsy

Hzfengsy · 2025-06-02T12:41:01Z

Thanks for reminding me, I will take a close look tomorrow. also cc @LeiWang1999

Co-authored-by: DerrickYLJ <[email protected]>

tqchen · 2025-06-02T23:22:21Z

@tvm-bot rerun

Kathryn-cat · 2025-06-02T23:26:23Z

@tvm-bot rerun

MasterJH5574

LGTM.

This PR updates the datatype names in our NCCL integration to align with the latest changes in apache#18027. These changes were missed in the previous PR.

FredJia-intellif · 2025-07-24T03:17:06Z

src/tir/transforms/dtype_conversion.h

-          // E5M2 format, consistent with IEEE-754
+        case DataType::kFloat8_e4m3fnuz:
+          // UE4M3 format, not consistent with IEEE-754
+          return FloatConfig(4, 3, 7, InftyStyle::kNone, NaNStyle::kAllOnes);


@Kathryn-cat , hi, why are the FloatConfig of e4m3, e4m3b11fnuz, e4m3fn and e4m3fnuz exactly the same?

Kathryn-cat force-pushed the blackwell-dtype branch from c9d49a3 to c17241a Compare May 31, 2025 01:20

Kathryn-cat force-pushed the blackwell-dtype branch from 9f0210f to 98596ff Compare May 31, 2025 18:52

MasterJH5574 reviewed Jun 1, 2025

View reviewed changes

include/tvm/runtime/data_type.h Outdated Show resolved Hide resolved

MasterJH5574 reviewed Jun 1, 2025

View reviewed changes

src/runtime/device_api.cc Outdated Show resolved Hide resolved

Kathryn-cat force-pushed the blackwell-dtype branch from 2890d8e to 52365da Compare June 2, 2025 18:21

addressed comments

1c36432

Co-authored-by: DerrickYLJ <[email protected]>

Kathryn-cat force-pushed the blackwell-dtype branch from 52365da to 1c36432 Compare June 2, 2025 18:26

Kathryn-cat and others added 2 commits June 2, 2025 15:09

change NV naming; prevent CUDA codegen on unsupported dtypes

8d7c83c

Co-authored-by: DerrickYLJ <[email protected]>

fix ci error

481b6c8

Kathryn-cat force-pushed the blackwell-dtype branch from fef9930 to 481b6c8 Compare June 2, 2025 21:07

lint

87ecbaa

MasterJH5574 approved these changes Jun 3, 2025

View reviewed changes

spectrometerHBH approved these changes Jun 3, 2025

View reviewed changes

spectrometerHBH merged commit 2dce84f into apache:main Jun 3, 2025
10 checks passed

MasterJH5574 mentioned this pull request Jun 6, 2025

[Fix] Update datatype names for NCCL integration #18045

Closed

ysh329 mentioned this pull request Jul 16, 2025

[Release] v0.21.0 Release Candidate Notes #18150

Closed

FredJia-intellif reviewed Jul 24, 2025

View reviewed changes

ShiboXing pushed a commit to ShiboXing/tvm that referenced this pull request Aug 10, 2025

[Dtype] Low-precision Blackwell Datatype Support (apache#18027)

5ed7ff8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dtype] Low-precision Blackwell Datatype Support #18027

[Dtype] Low-precision Blackwell Datatype Support #18027

Uh oh!

Kathryn-cat commented May 31, 2025 •

edited

Loading

Uh oh!

Kathryn-cat commented May 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

tqchen commented Jun 2, 2025

Uh oh!

Hzfengsy commented Jun 2, 2025

Uh oh!

tqchen commented Jun 2, 2025

Uh oh!

Kathryn-cat commented Jun 2, 2025

Uh oh!

MasterJH5574 left a comment

Uh oh!

Uh oh!

FredJia-intellif Jul 24, 2025

Uh oh!

Uh oh!

[Dtype] Low-precision Blackwell Datatype Support #18027

[Dtype] Low-precision Blackwell Datatype Support #18027

Uh oh!

Conversation

Kathryn-cat commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kathryn-cat commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tqchen commented Jun 2, 2025

Uh oh!

Hzfengsy commented Jun 2, 2025

Uh oh!

tqchen commented Jun 2, 2025

Uh oh!

Kathryn-cat commented Jun 2, 2025

Uh oh!

MasterJH5574 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FredJia-intellif Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kathryn-cat commented May 31, 2025 •

edited

Loading

Kathryn-cat commented May 31, 2025 •

edited

Loading