Add an option to enable --runtime-repack in llama.cpp

**Is your feature request related to a problem? Please describe.**
After updating to 0.3.4 ARM optimized Q4_0_4_4 models are no longer supported by llama.cpp. Instead when loading an error is thrown that "TYPE_Q4_0_4_4 REMOVED, use Q4_0 with runtime repacking". This comes from [this](https://github.com/ggerganov/llama.cpp/pull/9921) change in llama.cpp. I believe a flag must be added to llama-cpp-python and passed down llama.cpp internal to enable this feature. 

**Describe the solution you'd like**
Add a flag for Llama instantiation that enables runtime repack in llama.cpp



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add an option to enable --runtime-repack in llama.cpp #1860

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add an option to enable --runtime-repack in llama.cpp #1860

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions