Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> For transformer, v4 chip has 70-100% compute capacity and 40% memory of A100 for pretty much the same price.

Note there are added costs when using V4 nodes such as the VM, storage and logging which can get $$$.

> where for GPU model need to fit in NVlink connected GPUs

Huh, where is this coming from? You can definitely efficiently scale transformers across multiple servers with parallelism and 1T is entirely feasible if you have the $. Nvidia demonstrated this back in 2021.



> Nvidia demonstrated this back in 2021.

Because Nvidia created a supercomputer with A100, with lot of focus for networking. Cloud providers don't give that option.


Azure and AWS have both offered high-bandwidth cluster options that allow scaling beyond a single server for several months now.

Pretty sure MosaicML also does this but I haven't used their offering.

https://www.amazon.science/blog/scaling-to-trillion-paramete...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: