Skip to content

Conversation

relent95
Copy link
Contributor

@relent95 relent95 commented Sep 3, 2025

This PR adds the missing hardsigmoid and hardswish operators in the Vulkan backend, helping to use models depending on them(such as PPOCRv5) on Vulkan.

@relent95 relent95 requested a review from 0cc4m as a code owner September 3, 2025 08:04
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Sep 3, 2025
Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@0cc4m 0cc4m merged commit 0014fb4 into ggml-org:master Sep 3, 2025
48 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Sep 4, 2025
…upport

* origin/master: (72 commits)
metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799)
llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791)
CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763)
server: add exceed_context_size_error type (ggml-org#15780)
Document the new max GPU layers default in help (ggml-org#15771)
ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669)
CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784)
opencl: add hs=40 to FA (ggml-org#15758)
CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760)
vulkan: fix mmv subgroup16 selection (ggml-org#15775)
vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724)
vulkan : update ggml_vk_instance_validation_ext_available (ggml-org#15666)
ggml vulkan: add hardsigmoid and hardswish operations (ggml-org#15762)
CUDA: Optimize `rms_norm_f32` kernel and its fused variants, giving 1-6% perf E2E (ggml-org#15715)
model-conversion : fix pyright errors (ggml-org#15770)
sampling : optimize dist sampler (ggml-org#15704)
llama : fix incorrect model type for Gemma 270M (ggml-org#15764)
model-conversion : remove hardcoded /bin/bash shebangs [no ci] (ggml-org#15765)
CANN: Add RoPE contiguous check for 310I DUP device (ggml-org#15735)
ggml-cpu : optimize RVV kernels (ggml-org#15720)
...
walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants