Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I struggle to comprehend how an odd quantization like 5 bit, that doesn't align well with 8 bit boundaries, would not slow things down for inference

Who says it doesn’t :)?

At least in my tests there is a big penalty to using an “odd” bit stride.

Testing 4bit quantization vs 5bit in Llama.cpp, I see quite a bit more than the “naiively expected” 25% slowdown from 4 to 5 bits.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: