You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
After updating to 0.3.4 ARM optimized Q4_0_4_4 models are no longer supported by llama.cpp. Instead when loading an error is thrown that "TYPE_Q4_0_4_4 REMOVED, use Q4_0 with runtime repacking". This comes from this change in llama.cpp. I believe a flag must be added to llama-cpp-python and passed down llama.cpp internal to enable this feature.
Describe the solution you'd like
Add a flag for Llama instantiation that enables runtime repack in llama.cpp