In the coarse graining code, you use an @parameter-for. Doesn’t that lead to som... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		totalperspectiv 8 months ago \| parent \| context \| favorite \| on: Highly efficient matrix transpose in Mojo In the coarse graining code, you use an @parameter-for. Doesn’t that lead to some pretty large code size unrolling that? Or is that less of an issue on GPU? Great write up! I learned a lot!

simon_vtr 8 months ago [–]

It doesn’t. The batch size is just 8. This is a very good trick and often needed to archive peak performance in memory bound kernels. You can checkout the equivalent code in cuda aswell :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact