The most interesting thing about AMD is their ability to create GPUs with superior synthetic specs than their NVIDIA competitor and yet they lose across the board in anything but embarrassingly parallel applications. Not only do they lose, but they lose big.
64 of them train Alexnet in ~5 hours. A DGX-1 does it in 2. Don't have/Can't get a $129K DGX-1? Fine, buy a Big Sur from one of several vendors and throw in 8 GTX 1080s as soon as they're out, implement "One Weird Trick", and you'll do it in <3. That ought to run you about $30-$35K versus >$600K for those 64 Xeon servers.
FPGAs OTOH are fantastic at low memory bandwidth embarrassingly parallel computation like bitcoin mining. Deep Learning is not such a domain.
I don't see anywhere on that page any benchmark about 5000 images/sec, maybe you should update your link to a page showing that info or update your claim to match what you link to?
Even your previous post had links to two page that were not measuring the same thing... and thus could not support your claims?
Based on the text you apparently couldn't be bothered to read, for AlexNet, a forward pass on 128 images (Input 128x3x224x224) takes ~25 milliseconds (ignoring O(n) dropout and softmax where n is 128). I'll let you do the math for the rest of this...
Here are two examples. First, the engine behind Folding@Home: https://simtk.org/plugins/moinmoin/openmm/BenchmarkOpenMMDHF...
See also AlexNet (Deep Learning's most prominent benchmark): https://github.com/amd/OpenCL-caffe versus GTX TitanX: https://github.com/soumith/convnet-benchmarks
TLDR: AMD losing by a factor of 2 or more...
So unless this changes dramatically, NVIDIA IMO will continue to dominate.
But hey, they both crush Xeon Phi and FPGAs so it doesn't suck to be #2 when #3 is Intel.