Who says it doesn’t :)?
At least in my tests there is a big penalty to using an “odd” bit stride.
Testing 4bit quantization vs 5bit in Llama.cpp, I see quite a bit more than the “naiively expected” 25% slowdown from 4 to 5 bits.
Who says it doesn’t :)?
At least in my tests there is a big penalty to using an “odd” bit stride.
Testing 4bit quantization vs 5bit in Llama.cpp, I see quite a bit more than the “naiively expected” 25% slowdown from 4 to 5 bits.