> For transformer, v4 chip has 70-100% compute capacity and 40% memory of A100 for pretty much the same price.
Note there are added costs when using V4 nodes such as the VM, storage and logging which can get $$$.
> where for GPU model need to fit in NVlink connected GPUs
Huh, where is this coming from? You can definitely efficiently scale transformers across multiple servers with parallelism and 1T is entirely feasible if you have the $. Nvidia demonstrated this back in 2021.
Note there are added costs when using V4 nodes such as the VM, storage and logging which can get $$$.
> where for GPU model need to fit in NVlink connected GPUs
Huh, where is this coming from? You can definitely efficiently scale transformers across multiple servers with parallelism and 1T is entirely feasible if you have the $. Nvidia demonstrated this back in 2021.