[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check #25046

bigPYJ1151 · 2025-09-17T06:35:01Z

Purpose

Wrap torch.cuda.Stream to avoid break on CPU backend
Fix onednn linear contiguous check to avoid break in torch.compile tensor reuse case.

Test Plan

CI tests

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: jiang1.li <[email protected]>

gemini-code-assist

Code Review

This pull request introduces fixes for CPU-only execution. It correctly stubs out torch.cuda.Stream to prevent crashes in the CPU model runner and disables the unsupported Dual-Batch Overlap (DBO) feature on CPU. It also attempts to fix an issue with torch.compile by relaxing a contiguity check in the OneDNN matrix multiplication kernel. However, this change in the OneDNN kernel is likely to cause memory corruption, as it allows a non-contiguous output tensor without passing its memory layout (strides) to the underlying implementation. This is a critical issue that must be addressed.

gemini-code-assist · 2025-09-17T06:37:13Z

csrc/cpu/dnnl_kernels.cpp

  TORCH_CHECK(a.dim() == 2);
  TORCH_CHECK(a.stride(-1) == 1);
-  TORCH_CHECK(c.is_contiguous());
+  TORCH_CHECK(c.stride(-1) == 1);


Relaxing the check from c.is_contiguous() to c.stride(-1) == 1 without providing the full tensor strides to the underlying OneDNN kernel is dangerous and will likely lead to memory corruption.

When c is not contiguous (e.g., it's a view of a larger tensor, which can be the case with torch.compile's tensor reuse), its rows are not packed together in memory. The MatMulPrimitiveHandler receives c.data_ptr() but does not appear to receive the stride for c's first dimension (unlike for tensor a, where a.stride(0) is passed via exec_args).

Without the stride information, the kernel will write output rows assuming a contiguous layout, overwriting memory that does not belong to c. This can cause silent data corruption and difficult-to-debug crashes.

To fix this correctly, you must either:

Pass the strides of c to MatMulPrimitiveHandler and ensure the OneDNN primitive is configured to use them. This would likely involve adding c.stride(0) to MatMulPrimitiveHandler::ExecArgs.

If modifying the handler is not feasible, you should enforce contiguity. Instead of relaxing the check, you could create a temporary contiguous tensor for the output and copy it back to c if c was not originally contiguous.

Given the potential for silent memory corruption, this is a critical issue.

…vllm-project#25046) Signed-off-by: jiang1.li <[email protected]>

…vllm-project#25046) Signed-off-by: jiang1.li <[email protected]> Signed-off-by: charlifu <[email protected]>

…vllm-project#25046) Signed-off-by: jiang1.li <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…vllm-project#25046) Signed-off-by: jiang1.li <[email protected]>

bigPYJ1151 added 2 commits September 17, 2025 04:06

fix dbo

23aa3fe

Signed-off-by: jiang1.li <[email protected]>

fix onednn

730bb9e

Signed-off-by: jiang1.li <[email protected]>

bigPYJ1151 requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 17, 2025 06:35

Merge branch 'main' into fix-cpu-modelrunner

9d2f9ae

mergify bot added the v1 label Sep 17, 2025

gemini-code-assist bot reviewed Sep 17, 2025

View reviewed changes

jikunshang approved these changes Sep 17, 2025

View reviewed changes

jikunshang enabled auto-merge (squash) September 17, 2025 08:43

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 17, 2025

vllm-bot merged commit 9fccd04 into vllm-project:main Sep 17, 2025
90 of 93 checks passed

bigPYJ1151 deleted the fix-cpu-modelrunner branch September 17, 2025 13:10

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check (…

dedf566

…vllm-project#25046) Signed-off-by: jiang1.li <[email protected]>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check (…

d9c268a

…vllm-project#25046) Signed-off-by: jiang1.li <[email protected]> Signed-off-by: charlifu <[email protected]>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check (…

e2d9c91

…vllm-project#25046) Signed-off-by: jiang1.li <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check #25046

[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check #25046

Uh oh!

bigPYJ1151 commented Sep 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check #25046

[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check #25046

Uh oh!

Conversation

bigPYJ1151 commented Sep 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bigPYJ1151 commented Sep 17, 2025 •

edited by github-actions bot

Loading