As of [this PR to llama.cpp](https://github.com/ggerganov/llama.cpp/pull/3946) the CUDA binaries are capable of running with CPU only, as long as `n_gpu_layers = 0`. This might mean that we can significantly simplify our distribution of binaries by removing the CPU only variants and only shiping CUDA ones.