optimize(kokoro): Specialize conv1D #102

danielzgtg · 2025-08-10T15:02:21Z

perf_battery -mp Kokoro_espeak.gguf -nt 4:

Before: 1586.519654 ms, 87.7512%
After: 1475.965502 ms, 81.6744%
Overall: (-110.554152 ms, -6.97%, -6.0768pp).

TODO: Does not work yet. The output sound is all wrong.

Somehow ggml_vec_dot_f32 used by ggml_compute_forward_mul_mat is too slow. Im2col causes massive matmuls like [8281,1408]@[1408,128]. That is spilling all the way into L3 cache.

This PR aims to keep the per-channel kernels in L1d cache. For this, I just used naïve matmul as our channel count of 128 is large, and performed the convolution by accumulating in a sliding window. The input data will only be read once. The targeted bottleneck convolutions are kokoro\.decoder\.generator\.noise_blocks\.\d\.resblock\.\d\.convs\d_weight.

perf_battery -mp Kokoro_espeak.gguf -nt 4: - Before: 1586.519654 ms, 87.7512% - After: 1475.965502 ms, 81.6744% - Overall: (-110.554152 ms, -6.97%, -6.0768pp). Somehow ggml_vec_dot_f32 used by ggml_compute_forward_mul_mat is too slow. Im2col causes massive matmuls like [8281,1408]@[1408,128]. That is spilling all the way into L3 cache. This PR aims to keep the per-channel kernels in L1d cache. For this, I just used naïve matmul as our channel count of 128 is large, and performed the convolution by accumulating in a sliding window. The input data will only be read once.

danielzgtg · 2025-08-10T15:22:11Z

#96 inspired this, but the targeted weights are different. I plan to make breaking changes later to Kokoro_GGUF, but not in this PR.

convs[12]_weight here is currently [128,128,K∈{3,7,11}]. This PR transposes it at runtime to [128,K,128], and I'd like to make this permanent
decoder_blocks\.[012]\.conv1_weight there is [1024,IC=1090,3]. IC is not a multiple of 32, and this prevents Q[458] quantization even after transposing. We can just that up to 1120 and perhaps use a view to discard the extra elements after
decoder_blocks\.[012]\.conv2_weight does not have the odd IC size problem, and should be as easy as convs[12]_weight

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize(kokoro): Specialize conv1D #102

optimize(kokoro): Specialize conv1D #102

Uh oh!

danielzgtg commented Aug 10, 2025 •

edited

Loading

Uh oh!

danielzgtg commented Aug 10, 2025

Uh oh!

Uh oh!

optimize(kokoro): Specialize conv1D #102

Are you sure you want to change the base?

optimize(kokoro): Specialize conv1D #102

Uh oh!

Conversation

danielzgtg commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielzgtg commented Aug 10, 2025

Uh oh!

Uh oh!

danielzgtg commented Aug 10, 2025 •

edited

Loading