-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Vulkan: add conv_transpose_2d operation #16022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan: add conv_transpose_2d operation #16022
Conversation
…95/llama.cpp into add-ggml-vulkan-conv-transpose-2d
If this is based on the existing conv_2d shader, shouldn't it be possible to use the existing .comp file and only change the parts that are actually different with a preprocessor variable? |
…m.comp for conv_transpose_2d operation
You are right, if there will be different shaders with code repeats for each conv variant (transpose, gradient, channel-wise), the maintenance will be really though. Also, there are still many ways left to optimize the conv kernel and I also have a few updates in my private repo that I might polish in the near future and submit. |
@etasnadi Yeah, the PR has already been updated to do that. You could also do a review if you want, you know the shader better than me. I'll mostly make sure that the C++ code is fine and the shader passes on Intel, AMD and Nvidia. |
Yes, I've just realized that it was updated since. Now my problem is that the kernel is and will be too complicated (it was really complicated before too), so we might need to introduce some abstraction I guess. Maybe @jeffbolznv has some ideas. What do you think about adding support for HLSL shaders? As far as I know thae glslangValidatior already has basic support but in the meantime we could use dxc. |
What's the advantage of HLSL over GLSL? I'm not familiar with it, and not a fan of Microsoft dependencies. It would probably make maintenance harder. If you want to look into a shader language with more modern features, wouldn't slang be more interesting and more open? Personally I'm hoping one of the projects looking into a C++-based compute shader syntax (similar to CUDA and ROCm) pans out. For now GLSL is good enough for me. |
For example, it seems that it supports templates: https://devblogs.microsoft.com/directx/announcing-hlsl-2021/#template-functions-and-data-types - and templates alone would help a lot. I am not a fan of adding dependencies to projects governed by a single company either, but this kernel will be unmaintainable in the future at this abstraction level and GLSL have limited features to deal with the problem. Sglang is also a good idea, however I do not know how much it is adopted and if it is mature enough. For example I tried to use coopmats with slang without success ~1 year ago - I guess their compiler does not support all extensions automatically? |
Now I see that they don't have coopmat support but it's WIP. shader-slang/slang#7634 So I believe it is a good idea to add support for slang in the near future! Also, there are several NV suffixed accounts contributing to the project so I guess Nvidia has a bet on the project. |
@etasnadi I think it's Khronos, not Nvidia specifically, but yeah. Coopmat should already be there, see shader-slang/slang#7170 (comment), but let's not sidetrack this PR. If you want to look into it, go ahead. If discussion is needed, please open an issue about it. |
HLSL doesn't support spec constants which IMO is a deal breaker. It also only has coopmat1 level of support for use in Vulkan. slang supports coopmat2, spec constants, and generics, and there are cases where generics would be helpful. |
…v_transpose_2d operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code runs correctly on my hardware. Looks good, we just need to resolve the last few comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* origin/master: (39 commits) ci : disable AMD workflows + update NVIDIA workflows (ggml-org#16200) ci : enable Vulkan workflow on Mac (ggml-org#16194) ggml-cpu: Respect cpumask settings (ggml-org#16164) ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (ggml-org#15928) zdnn: refactor codebase + add docs (ggml-org#16178) codeowners : add @danbev to model-conversion example [no ci] (ggml-org#16190) devops: add s390x containers (ggml-org#15915) ggml-cpu : fix typo in gemm comments [no ci] (ggml-org#16189) feat: Add conversion support in GraniteHybrid for non-hybrid (all attn) (ggml-org#16177) clang-tidy : disable warning about performance enum size (ggml-org#16127) ggml : implement set_rows with i32 index (ggml-org#16159) codeowners : update + cleanup (ggml-org#16174) common : enable `--offline` mode without curl support (ggml-org#16137) webui : fix handling incomplete chunks (ggml-org#16107) embedding : fix typos in README (ggml-org#16171) common : remove unused local variables (ggml-org#16140) ggml : extend ggml_can_fuse to work with non-sequential nodes (ggml-org#16123) ggml : add ggml_op_is_empty (ggml-org#16122) codeowners : update ownership for @ngxson and @allozuar (ggml-org#16128) Vulkan: add conv_transpose_2d operation (ggml-org#16022) ...
* Vulkan: add conv_transpose_2d operation * Vulkan: fix typo in conv_transpose_2d shader(s0mp, s0L, s1mp, s1L) * Vulkan: fix incorrect indentation in conv_transpose_2d shader * Vulkan: add checking the push constants size limit and reuse conv2d_mm.comp for conv_transpose_2d operation * Vulkan: revert the order of the index calculation and bound check in conv_2d shader * Vulkan: explicity check push constants limit in supports_op() for conv_transpose_2d operation. * Vulkan: remove unnecessary lower bound checks for H/W_idx in the conv_2d shader.
This PR adds conv_transpose_2d operation to Vulkan backend. The code are based on the implementation of the existing conv_2d operation. The shader supports strides (s0, s1), paddings (p0, p1) and dilations (d0, d1). But in ggml_vk_conv_transpose_2d(), they are constrained as s1 = s0, p0 = p1 = 0, d0 = d1 = 1, because of the existing GGML_OP_CONV_TRANSPOSE_2D interface.