ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops #15695

danbev · 2025-08-31T13:17:05Z

This commit adds support for the TRANSPOSE and RESHAPE operations in the
ggml webgpu backend.

Co-authored-by: Diego Devesa [email protected]

This commit disables flash attention in the webgpu test. The motivation for this is that it seem like flash attention might not be supported for webgpu when using llvmpipe (not 100% though as it works for me locally but I'm running a different version of mesa). This is an snipped from the log: ```console 2025-08-31T10:49:20.2265119Z 27: /home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-backend.cpp:789: pre-allocated tensor (cache_v_l0 (view) (permuted) (transposed)) in a buffer (WebGPU) that cannot run the operation (TRANSPOSE) 2025-08-31T10:49:20.2266911Z 27: /home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-backend.cpp:789: pre-allocated tensor (cache_v_l0 (view) (permuted) (transposed)) in a buffer (WebGPU) that cannot run the operation (TRANSPOSE) 2025-08-31T10:49:20.2268797Z 27: ␛[34m0.01.085.256␛[0m ␛[35mW llama_context: layer 0 is assigned to device WebGPU but the Flash Attention tensor is assigned to device CPU (usually due to missing support) 2025-08-31T10:49:20.2269971Z 27: ␛[0m␛[34m0.01.085.262␛[0m ␛[35mW llama_context: Flash Attention was auto, set to disabled 2025-08-31T10:49:20.2271542Z 27: ␛[0m␛[34m0.01.085.302␛[0m ␛[35mW llama_context: layer 0 is assigned to device WebGPU but the Flash Attention tensor is assigned to device CPU (usually due to missing support) 2025-08-31T10:49:20.2272942Z 27: ␛[0m␛[34m0.01.085.303␛[0m ␛[35mW llama_context: Flash Attention was auto, set to disabled 2025-08-31T10:49:20.2274119Z 27: ␛[0m␛[34m0.01.085.334␛[0m ␛[35mW llama_context: layer 0 is assigned to device WebGPU but the Flash Attention tensor is assigned to device CPU (usually due to missing support) 2025-08-31T10:49:20.2275271Z 27: ␛[0m␛[34m0.01.085.335␛[0m ␛[35mW llama_context: Flash Attention was auto, set to disabled 2025-08-31T10:49:20.2276529Z 27: ␛[0m␛[34m0.01.085.470␛[0m ␛[35mW llama_context: layer 0 is assigned to device WebGPU but the Flash Attention tensor is assigned to device CPU (usually due to missing support) 2025-08-31T10:49:20.2277776Z 27: ␛[0m␛[34m0.01.085.471␛[0m ␛[35mW llama_context: Flash Attention was auto, set to disabled 2025-08-31T10:49:20.2279308Z 27: ␛[0m/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-backend.cpp:789: pre-allocated tensor (cache_v_l0 (view) (permuted) (transposed)) in a buffer (WebGPU) that cannot run the operation (TRANSPOSE) 2025-08-31T10:49:20.2281488Z 27: /home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-backend.cpp:789: pre-allocated tensor (cache_v_l0 (view) (permuted) (transposed)) in a buffer (WebGPU) that cannot run the operation (TRANSPOSE) ```

This reverts commit 522ae98.

Just want to see if this enabled CI webgpu tests to pass.

slaren · 2025-08-31T15:08:06Z

This is still a bug somewhere, it should not be hidden by disabling the test.

danbev · 2025-08-31T15:39:19Z

This is still a bug somewhere, it should not be hidden by disabling the test.

My intention was to not to disable the test, but instead for WebGPU, set flash attention to off. My reasoning here was that the default was previously off unless I'm mistaken, and that the recent change to enable flash attention by default might be what is causing this test to start to fail. I was more curious if this would allow the test to pass.

slaren · 2025-08-31T15:50:22Z

The intention is to enable flash attention only if the backend supports it. If doing that check causes the backend to crash, then that indicates a problem somewhere, and should not be hidden.

Try this change to fix it instead:

diff --git a/ggml/src/ggml-webgpu/ggml-webgpu.cpp b/ggml/src/ggml-webgpu/ggml-webgpu.cpp
index 32f1e304e..4e3f152a7 100644
--- a/ggml/src/ggml-webgpu/ggml-webgpu.cpp
+++ b/ggml/src/ggml-webgpu/ggml-webgpu.cpp
@@ -1062,6 +1062,8 @@ static bool ggml_backend_webgpu_device_supports_op(ggml_backend_dev_t dev, const
         case GGML_OP_NONE:
         case GGML_OP_VIEW:
         case GGML_OP_PERMUTE:
+        case GGML_OP_TRANSPOSE:
+        case GGML_OP_RESHAPE:
             return true;
         case GGML_OP_CPY:
         case GGML_OP_SET_ROWS:

This reverts commit 52794d3.

This commit adds support for the TRANSPOSE and RESHAPE operations in the ggml webgpu backend. Co-authored-by: Diego Devesa <[email protected]>

CISC · 2025-08-31T18:59:07Z

It should also be added here (and return true (not that it matters (yet) as the return is not checked)):

llama.cpp/ggml/src/ggml-webgpu/ggml-webgpu.cpp

Lines 611 to 614 in ea8412f

    
           case GGML_OP_NONE: 
        
           case GGML_OP_VIEW: 
        
           case GGML_OP_PERMUTE: 
        
               return false;

This commit add GGML_OP_TRANSPOSE and GGML_OP_RESHAPE cases to the ggml_webgpu_encode_node function in ggml-webgpu.cpp. The actual operation are not implemented yet, and are left as TODOs. Co-authored-by: Sigbjørn Skjæret <[email protected]>

ggml/src/ggml-webgpu/ggml-webgpu.cpp

…s [no ci] Remove TODO comment about unimplemented operations.

…s [no ci] Move GGML_OP_TRANSPOSE and GGML_OP_RESHAPE to the other no-op cases.

) * ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops This commit adds support for the TRANSPOSE and RESHAPE operations in the ggml webgpu backend. Co-authored-by: Diego Devesa <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>

github-actions bot added the devops improvements to build systems and github actions label Aug 31, 2025

danbev added 2 commits August 31, 2025 16:32

Revert "ci : disable flash attention for webgpu test"

7acee24

This reverts commit 522ae98.

tests : conditionally disable flash attention in webgpu

52794d3

Just want to see if this enabled CI webgpu tests to pass.

github-actions bot added the testing Everything test related label Aug 31, 2025

danbev and others added 2 commits August 31, 2025 17:55

Revert "tests : conditionally disable flash attention in webgpu"

71203bd

This reverts commit 52794d3.

ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops

ea8412f

This commit adds support for the TRANSPOSE and RESHAPE operations in the ggml webgpu backend. Co-authored-by: Diego Devesa <[email protected]>

danbev changed the title ~~ci : disable flash attention for webgpu test~~ ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops Aug 31, 2025

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 31, 2025

danbev marked this pull request as ready for review August 31, 2025 18:34

CISC reviewed Sep 1, 2025

View reviewed changes

ggml/src/ggml-webgpu/ggml-webgpu.cpp Outdated Show resolved Hide resolved

squash! ggml : WebGPU added cases for TRANSPOSE and RESHAPE operation…

adeb0b5

…s [no ci] Remove TODO comment about unimplemented operations.

danbev mentioned this pull request Sep 1, 2025

vulkan : update ggml_vk_instance_validation_ext_available #15666

Merged

squash! ggml : WebGPU added cases for TRANSPOSE and RESHAPE operation…

b05a487

…s [no ci] Move GGML_OP_TRANSPOSE and GGML_OP_RESHAPE to the other no-op cases.

CISC approved these changes Sep 1, 2025

View reviewed changes

slaren approved these changes Sep 1, 2025

View reviewed changes

danbev merged commit 77dee9d into ggml-org:master Sep 1, 2025
1 check passed

danbev deleted the ci-webgpu-flash-attention-disable branch September 1, 2025 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops #15695

ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops #15695

Uh oh!

danbev commented Aug 31, 2025 •

edited

Loading

Uh oh!

slaren commented Aug 31, 2025

Uh oh!

danbev commented Aug 31, 2025

Uh oh!

slaren commented Aug 31, 2025 •

edited

Loading

Uh oh!

CISC commented Aug 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops #15695

ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops #15695

Uh oh!

Conversation

danbev commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Aug 31, 2025

Uh oh!

danbev commented Aug 31, 2025

Uh oh!

slaren commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Aug 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danbev commented Aug 31, 2025 •

edited

Loading

slaren commented Aug 31, 2025 •

edited

Loading