Open
Description
Is your feature request related to a problem? Please describe.
Currently to test the model one needs to use Nvidia GPU. In principale it should be possible to run it on with Apple GPU via Metal as well. It's implemented in llama.cpp
Describe the solution you'd like
When model is installed on a Mac it should use this dependency: https://github.com/philipturner/metal-flash-attention
Describe alternatives you've considered
No response
Additional context
No response
Organisation
AWI