Skip to content

Commit "CUDA: Quantized matrix matrix multiplication" causes assert "ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1" on Windows when vocab_size != 32000 #2484

Closed
@dranger003

Description

@dranger003

Using CUDA on Windows when model vocab_size != 32000, inference crashes immediately with:

ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1

See #2160 (comment) for more details.
Reverting to commit before 11f3ca0 resolves the issue.
Also, the workaround proposed in #2160 (comment) appears to work (at least for me).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions