Skip to content

Conversation

yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Aug 26, 2025

Purpose

Fix warning for

home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::_256, cute::_128>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 272

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<256>, cute::C<64>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 276

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<256>, cute::C<32>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 280

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<256>, cute::C<16>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 284

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::_128, cute::_256>, ClusterShape_MNK=cute::tuple<cute::_2, cute::_1, cute::_1>]" at line 288

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::_128, cute::_256>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 292

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::_128, cute::_128>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 296

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<128>, cute::C<64>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 300

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<128>, cute::C<32>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 304

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<128>, cute::C<16>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 308

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<256>, cute::C<128> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:272:30:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
  213 |     MainloopArguments mainloop_arguments{
      |                                                                                                   ^         
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<256>, cute::C<64> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:276:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<256>, cute::C<32> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:280:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<256>, cute::C<16> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:284:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<256> >; ClusterShape_MNK = cute::tuple<cute::C<2>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:288:30:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<256> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:292:30:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<128> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:296:30:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<64> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:300:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<32> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:304:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<16> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:308:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]

Test

Now

[2/3] Install the project...
-- Install configuration: "Release"
-- Up-to-date: /home/wentao/vllm/vllm/cumem_allocator.abi3.so
-- Installing: /home/wentao/vllm/vllm/_C.abi3.so
-- Set non-toolchain portion of runtime path of "/home/wentao/vllm/vllm/_C.abi3.so" to ""
...
-- Up-to-date: /home/wentao/vllm/vllm/vllm_flash_attn/flash_attn_interface.py
-- Up-to-date: /home/wentao/vllm/vllm/vllm_flash_attn/__init__.py

Signed-off-by: yewentao256 <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a compilation warning in w4a8_mm_entry.cu caused by a narrowing conversion from int64_t to int for the group_size parameter. The solution implements a runtime check to validate that group_size is within the representable range of an int before casting it. This change is correct, safe, and effectively resolves the compiler warning. The updated code is clean and I have no further suggestions for improvement.

@yewentao256
Copy link
Member Author

@mgoin CC

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@mgoin mgoin enabled auto-merge (squash) August 29, 2025 18:57
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 29, 2025
@DarkLight1337
Copy link
Member

Please merge from main to fix CI

@vllm-bot vllm-bot merged commit c4ed78b into vllm-project:main Sep 3, 2025
68 of 71 checks passed
mayuyuace pushed a commit to mayuyuace/vllm that referenced this pull request Sep 3, 2025
845473182 pushed a commit to 845473182/vllm that referenced this pull request Sep 3, 2025
* 'main' of https://github.com/845473182/vllm: (457 commits)
  [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132)
  [Misc] Add check for dual_chunk_attention (vllm-project#24070)
  [Doc]: fix typos in Python comments (vllm-project#24115)
  [Doc]: fix typos in Python comments (vllm-project#24093)
  [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660)
  fix some typos (vllm-project#24071)
  [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656)
  Upgrade xgrammar to 0.1.23 (vllm-project#22988)
  Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073)
  [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081)
  [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121)
  [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119)
  [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692)
  [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936)
  [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370)
  Fix weights loading for Apertus (vllm-project#24100)
  [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110)
  [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902)
  Run ruff format on a few files. (vllm-project#24075)
  [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945)
  ...
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants