Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The most interesting thing about AMD is their ability to create GPUs with superior synthetic specs than their NVIDIA competitor and yet they lose across the board in anything but embarrassingly parallel applications. Not only do they lose, but they lose big.

Here are two examples. First, the engine behind Folding@Home: https://simtk.org/plugins/moinmoin/openmm/BenchmarkOpenMMDHF...

See also AlexNet (Deep Learning's most prominent benchmark): https://github.com/amd/OpenCL-caffe versus GTX TitanX: https://github.com/soumith/convnet-benchmarks

TLDR: AMD losing by a factor of 2 or more...

So unless this changes dramatically, NVIDIA IMO will continue to dominate.

But hey, they both crush Xeon Phi and FPGAs so it doesn't suck to be #2 when #3 is Intel.



Which FPGAs in which workloads do they crush?


Arria 10: http://www.nextplatform.com/2015/08/27/microsoft-extends-fpg...

600 images/s inference (highest ever publically reported), projected to be as high as 900 images/s. ~$5000 per FPGA.

Compare and contrast with TitanX/GTX 980 TI now topping 5000 images/s. GTX 1080 will only be faster than this and $600.

https://github.com/soumith/convnet-benchmarks

And so far, no FPGA training numbers. But here are some distributed training numbers for $10Kish Xeon servers:

https://software.intel.com/en-us/articles/caffe-training-on-...

64 of them train Alexnet in ~5 hours. A DGX-1 does it in 2. Don't have/Can't get a $129K DGX-1? Fine, buy a Big Sur from one of several vendors and throw in 8 GTX 1080s as soon as they're out, implement "One Weird Trick", and you'll do it in <3. That ought to run you about $30-$35K versus >$600K for those 64 Xeon servers.

FPGAs OTOH are fantastic at low memory bandwidth embarrassingly parallel computation like bitcoin mining. Deep Learning is not such a domain.


I don't see anywhere on that page any benchmark about 5000 images/sec, maybe you should update your link to a page showing that info or update your claim to match what you link to?

Even your previous post had links to two page that were not measuring the same thing... and thus could not support your claims?

I think you need to clarify.


Based on the text you apparently couldn't be bothered to read, for AlexNet, a forward pass on 128 images (Input 128x3x224x224) takes ~25 milliseconds (ignoring O(n) dropout and softmax where n is 128). I'll let you do the math for the rest of this...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: