optimize(kokoro): Specialize conv1D #102
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
perf_battery -mp Kokoro_espeak.gguf -nt 4:
TODO: Does not work yet. The output sound is all wrong.
Somehow ggml_vec_dot_f32 used by ggml_compute_forward_mul_mat is too slow. Im2col causes massive matmuls like [8281,1408]@[1408,128]. That is spilling all the way into L3 cache.
This PR aims to keep the per-channel kernels in L1d cache. For this, I just used naïve matmul as our channel count of 128 is large, and performed the convolution by accumulating in a sliding window. The input data will only be read once. The targeted bottleneck convolutions are
kokoro\.decoder\.generator\.noise_blocks\.\d\.resblock\.\d\.convs\d_weight
.