Closed
Description
Using CUDA on Windows when model vocab_size != 32000
, inference crashes immediately with:
ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1
See #2160 (comment) for more details.
Reverting to commit before 11f3ca0 resolves the issue.
Also, the workaround proposed in #2160 (comment) appears to work (at least for me).
Metadata
Metadata
Assignees
Labels
No labels