The most interesting thing about AMD is their ability to create GPUs with superi...

listic · on May 8, 2016

Which FPGAs in which workloads do they crush?

varelse · on May 8, 2016

Arria 10: http://www.nextplatform.com/2015/08/27/microsoft-extends-fpg...

600 images/s inference (highest ever publically reported), projected to be as high as 900 images/s. ~$5000 per FPGA.

Compare and contrast with TitanX/GTX 980 TI now topping 5000 images/s. GTX 1080 will only be faster than this and $600.

https://github.com/soumith/convnet-benchmarks

And so far, no FPGA training numbers. But here are some distributed training numbers for $10Kish Xeon servers:

https://software.intel.com/en-us/articles/caffe-training-on-...

64 of them train Alexnet in ~5 hours. A DGX-1 does it in 2. Don't have/Can't get a $129K DGX-1? Fine, buy a Big Sur from one of several vendors and throw in 8 GTX 1080s as soon as they're out, implement "One Weird Trick", and you'll do it in <3. That ought to run you about $30-$35K versus >$600K for those 64 Xeon servers.

FPGAs OTOH are fantastic at low memory bandwidth embarrassingly parallel computation like bitcoin mining. Deep Learning is not such a domain.

pierrebai · on May 8, 2016

I don't see anywhere on that page any benchmark about 5000 images/sec, maybe you should update your link to a page showing that info or update your claim to match what you link to?

Even your previous post had links to two page that were not measuring the same thing... and thus could not support your claims?

I think you need to clarify.

varelse · on May 9, 2016

Based on the text you apparently couldn't be bothered to read, for AlexNet, a forward pass on 128 images (Input 128x3x224x224) takes ~25 milliseconds (ignoring O(n) dropout and softmax where n is 128). I'll let you do the math for the rest of this...