Skip to content

Add an option to enable --runtime-repack in llama.cpp #1860

@ekcrisp

Description

@ekcrisp

Is your feature request related to a problem? Please describe.
After updating to 0.3.4 ARM optimized Q4_0_4_4 models are no longer supported by llama.cpp. Instead when loading an error is thrown that "TYPE_Q4_0_4_4 REMOVED, use Q4_0 with runtime repacking". This comes from this change in llama.cpp. I believe a flag must be added to llama-cpp-python and passed down llama.cpp internal to enable this feature.

Describe the solution you'd like
Add a flag for Llama instantiation that enables runtime repack in llama.cpp

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions