Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, CUDA gives you a whole programming language where you have to figure out the optimization for your particular card's cache size and bus width.

I'm saying the API surface of what to offer for LLMs is pretty small. Yeah, optimizing it is hard but it's "one really smart person works for a few weeks" hard, and most of the tiling techniques are public. Speaking of which, thanks for that blog post, off to read it now.



it's "one really smart person works for a few weeks" hard

AMD should hire that one really smart person.


yeah they really should. the primary reason AMD or behind in the GPU space is that they massively under-prioritize software.


Not having written one of these (…well I've written an IDCT) I can imagine it getting complicated if there's any known sparsity to take advantage of.


I assure you from experience that it's more than a smart person for a few weeks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: