vulkan: don't use std::string in load_shaders, to improve compile time #15724

jeffbolznv · 2025-09-01T22:39:05Z

This improves the incremental build time for ggml-vulkan.cpp pretty significantly:

before
msvc: 23s
wsl/gcc: 72s

after
msvc: 10s
wsl/gcc: 29s

@LostRuins I'm curious how much this helps on your system.

0cc4m · 2025-09-02T06:45:04Z

Is the problem the concatenation or the index to string conversion? Cause this does make it a little harder to track which shader is being used. Maybe we can find a better way to keep the information that doesn't affect the compile time, like a second string or a simple integer index.

jeffbolznv · 2025-09-02T12:44:10Z

The speedup comes from changing the function parameters in ggml_vk_create_pipeline to const char *. I suspect the compiler is converting a bunch of const char * to std::string at compile time and it's expensive because there are so many.

I've pushed another commit that keeps the full string for those that were using it. Compile time is still good.

LostRuins · 2025-09-03T09:26:21Z

Hi @jeffbolznv, yes, this seems to improve my compile times for ggml vulkan as well.
With -O3 it brings me from 1m 17.26s to 0m 39.70s, so it's a pretty good speedup.

This seems in line with your own findings.

0cc4m

I can't think of something better currently, so it's fine for now.

…upport * origin/master: (72 commits) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) vulkan : update ggml_vk_instance_validation_ext_available (ggml-org#15666) ggml vulkan: add hardsigmoid and hardswish operations (ggml-org#15762) CUDA: Optimize `rms_norm_f32` kernel and its fused variants, giving 1-6% perf E2E (ggml-org#15715) model-conversion : fix pyright errors (ggml-org#15770) sampling : optimize dist sampler (ggml-org#15704) llama : fix incorrect model type for Gemma 270M (ggml-org#15764) model-conversion : remove hardcoded /bin/bash shebangs [no ci] (ggml-org#15765) CANN: Add RoPE contiguous check for 310I DUP device (ggml-org#15735) ggml-cpu : optimize RVV kernels (ggml-org#15720) ...

…g-model-disabled-agent-prefill * origin/master: (84 commits) CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (ggml-org#15802) tests : add --list-ops and --show-coverage options (ggml-org#15745) gguf: gguf_writer refactor (ggml-org#15691) kv-cache : fix SWA checks + disable cacheless iSWA (ggml-org#15811) model-conversion : add --embeddings flag to modelcard.template [no ci] (ggml-org#15801) chat : fixed crash when Hermes 2 <tool_call> had a newline before it (ggml-org#15639) chat : nemotron thinking & toolcalling support (ggml-org#15676) scripts : add Jinja tester PySide6 simple app (ggml-org#15756) llama : add support for EmbeddingGemma 300m (ggml-org#15798) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) ...

ggml-org#15724) * vulkan: don't use std::string in load_shaders, to improve compile time * keep the string version for those calls that use it

vulkan: don't use std::string in load_shaders, to improve compile time

4e642bc

jeffbolznv requested a review from 0cc4m as a code owner September 1, 2025 22:39

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Sep 1, 2025

keep the string version for those calls that use it

94bf9bc

LostRuins approved these changes Sep 3, 2025

View reviewed changes

0cc4m approved these changes Sep 3, 2025

View reviewed changes

0cc4m merged commit 0fce7a1 into ggml-org:master Sep 3, 2025
47 of 48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: don't use std::string in load_shaders, to improve compile time #15724

vulkan: don't use std::string in load_shaders, to improve compile time #15724

Uh oh!

jeffbolznv commented Sep 1, 2025

Uh oh!

0cc4m commented Sep 2, 2025

Uh oh!

jeffbolznv commented Sep 2, 2025

Uh oh!

LostRuins commented Sep 3, 2025 •

edited

Loading

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

Uh oh!

vulkan: don't use std::string in load_shaders, to improve compile time #15724

vulkan: don't use std::string in load_shaders, to improve compile time #15724

Uh oh!

Conversation

jeffbolznv commented Sep 1, 2025

Uh oh!

0cc4m commented Sep 2, 2025

Uh oh!

jeffbolznv commented Sep 2, 2025

Uh oh!

LostRuins commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LostRuins commented Sep 3, 2025 •

edited

Loading