Make k-quants work with tensor dimensions that are not multiple of 256

As discussed in #1602, k-quants do not work for the Falcon-7B model. This is due to the fact that the number of columns in many tensors (`4544`) is not divisible by `256`, which is the super-block size of the k-quants.

It would be useful if k-quants could be adapted to work in such cases.