Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Trust the compiler. They are very, very good.

In some ways they are, and in some ways they're not.

I recently made a loop 5x faster by writing it slightly differently. Reason? MSVC decided to emit code that messes up store forwarding (very much a microarchitectural detail). Spelling out the pointer derefs produced much better code.

More specifically, the loop was loading ARGB values and storing them as BGR (yes, blitting on the CPU, don't ask). MSVC tried to be clever by storing the lower 16 bits of the ARGB value to the stack and then reading the individual bytes for writing. CPUs of course don't (usually) go to main memory when you write to memory and then read it, due to store forwarding. But that only works if your stores and loads are the same sizes - which 16 vs 8 bits are not. So the compiler somehow managed to make a 3 byte twiddle memory bound.

Profile, and don't be afraid of reading some assembly.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: