Intel Announces Movidius Myriad X VPU, Featuring ‘Neural Compute Engine’

jhallenworld · on Aug 28, 2017

One interesting thing about these Myriad chips is that they use Leon- an open source SPARC-V8 core for their main and "real time" CPUs. It's odd to me that it's not using ARM...

http://eyesofthings.eu/wp-content/uploads/deliverables/EoT_D...

fra · on Aug 29, 2017

Before they were Intel, Movidius was a small silicon company. They probably just didn't want to pay the ARM licensing fee.

jjawssd · on Aug 29, 2017

ARM is their primary competition in the low-power low-cost CPU world

TazeTSchnitzel · on Aug 28, 2017

> 4K at 30 Hz (H.264/H.265) and 60 Hz (M/JPEG) supported

I am greatly amused at the thought of 2160p60 MJPEG. That must be extremely efficient…

jdietrich · on Aug 28, 2017

MJPEG is a common encoding format for computer vision cameras. The artifacts associated with inter-frame compression is perfectly tolerable for humans, but it can completely stymie a lot of CV applications. High-efficiency video codecs rely on motion prediction to reduce file size - fine for watching Game of Thrones, but a disaster if you're a motion tracking algorithm.

Given the target market for this chip, it's a perfectly logical and highly useful feature.

Matthias247 · on Aug 29, 2017

The reason for these kind of codecs is also often latency. Efficient H.264 encoding (with B-frames and P-frames) has lots of it. For realtime applications usually a low latency is required.

cmrdporcupine · on Aug 28, 2017

I chuckled at that, too. I struggle to imagine what would be on the receiving end of that. I doubt most desktop machines could decode a 60fps 4k MJPEG stream, let alone a phone. Maybe they sell an SOC the other side too.

userbinator · on Aug 28, 2017

JPEG, and by extension MJPEG, is far simpler to decode than most video codecs, and also massively parallelisable. 4k at 60FPS is less than 500MP/s, which is easily achievable by a GPU.

cmrdporcupine · on Aug 28, 2017

Sorry, but I've benchmarked this recently as part of a project that I'm working on, and JPEG decoding with libjpeg-turbo consumed far more CPU than software H.264 decoding.

The encoding side is very expensive with H.264, but as I understand it a lot of work goes into building the right reference frames so that higher compression and faster decode can be achieved.

rubatuga · on Aug 29, 2017

You are comparing CPUs to GPUs right now

sheepdestroyer · on Aug 29, 2017

he said software x264, so CPU to CPU?

sand500 · on Aug 30, 2017

It may be different when comparing GPU to GPU because of difference in parallelization

nickpeterson · on Aug 29, 2017

Is that the new apples to oranges?

yeureka · on Aug 28, 2017

Any desktop would be able to do so.

I wrote a multi stream mjpeg decoder some years ago and it could run three MJPEG streams at 1920x1080@30 on a 2008 Macbook Pro (Core 2 Duo, 2.5ghz ).

userbinator · on Aug 29, 2017

That's only ~187MP/s. A single 4k 2160p60 stream needs almost 500MP/s, which might be at the edge of what's possible on a CPU today, but would make far more sense for a GPU to do.

wtallis · on Aug 29, 2017

It shouldn't be a stretch for a quad-core desktop processor today. Doubling the core count and increasing clock speed by 40% compared to a mobile Core 2 isn't hard. DDR4 instead of DDR2 means memory bandwidth is probably not an issue, and AVX can probably provide further headroom on the compute power.

And, of course, it's much easier to build a desktop with far more than four CPU cores these days.

occultist_throw · on Aug 28, 2017

I couldn't trust it. Not that I know that product specifically, but given previous working with Intel products outside of their X86 line, it will likely be EoL'ed in a year or so.

Ive been bitten by a few of their "Maker things". Faulty firmware, no support, no documentation, faulty I2C.. All sorts of things. And unless I have a timeframe how long they plan to support it and sell it, I just can't justify using it in any product I have or plan to build.

dbcurtis · on Aug 28, 2017

Yup. Burned here also. Intel has a very short attention span. And they don't seem to understand the concept of "EOL Warning" and "Final buy opportunity". Until they rid themselves of the habit of jerking their customer around, they should be treated as unreliable vendor.

Which makes me sad, because I was Intel blue-badge for 11 years back in the day. But the current management seems to be sprinting aimlessly after mirages.

cptskippy · on Aug 28, 2017

> But the current management seems to be sprinting aimlessly after mirages.

I wouldn't say they're sprinting for mirages. They're hastily entering solid markets but relying solely on their name and prestige. They're not making any investments, beyond marketing, and when the customers don't materialize overnight they're exiting hastily rather than figuring out what they're doing wrong.

40acres · on Aug 28, 2017

Intel spent billions on Altera, MobileEye and Movidus, it's hard to say they aren't making any investments.

dmix · on Aug 28, 2017

I believe the OP means time and dedication in their investments, not simply money. The type of things that startups grind through to become successful. Appropriate amounts of money do need to be injected - but at the right times, not necessarily from the beginning.

Bigger companies have a bad habit of comparing the early results of experimental internal businesses with the ROI of their mainline business and cut the cord before the product/market was actually ready to be flooded with cash to scale up.

It's funny how the business classic "Innovator's Dilemma" was written largely about companies like Intel who exist in traditional technology markets with predictable evolutionary modes... yet they still suffer from the same lack of 'intrapreneurship' mentality by treating internal startups with immature markets as if they were mature product lines.

They should probably stick to acquisitions of real startups in the growth stage or actually stick-it-out for the long run with the markets they invest in, rather than looking for high growth opportunities within a short-timeframe or nothing.

Burning early adopters is never a wise choice if that market turns out to have legs.

dbcurtis · on Aug 28, 2017

Very interesting that you should mention "Innovator's Delimma". I remember when Andy Grove made that required reading for all of executive staff, and divisions had to incorporate the thought process in their plans.

You are right that this is showing classic signs of not giving innovative business units enough runway to find their product-market fit. That's really hard to do given Intel's culture.

Craig Barrett did a lot of damage. One Intel mid-level manager described him as a pinata. "Who ever hits him the hardest gets the most candy." So managers reported synthetic disasters in order to get more funding. Paul Otellini was great, I have huge respect for him, but his tenure was too short lived. Otellini understood how to build a market.

dmix · on Aug 28, 2017

That's an interesting backstory, I was not familiar with the fact Andy Grove made it required reading. It really should be for every tech company management, regardless of it's age.

The farther a company gets from it's early roots the harder it gets for them to recreate the 'early days'. Unless they get a shake up in management. But the typical people good at startup culture would get killed pretty quickly in bigco corporate culture.

deepnotderp · on Aug 28, 2017

Exactly, but how many of those were good decisions?

Altera-- Not actually a bad decision, but it seems to have done nothing but bolstered Xilinx's position.

MobileEye-- A great way to burn ~$15 billion. The fact that Intel felt that it had to purchase that company with really no defensible advantage other than its maps highlights the huge challenges facing Intel if they're unable to cultivate an internal SDC team.

Movidius-- Legitimately I have no idea why they paid so much for a glorified DSP. But then again, so are most "Deep Learning Processors" right now.

FractalNerve · on Aug 28, 2017

>> Movidius ⸺ Legitimately I have no idea why they paid so much for a glorified DSP. But then again, so are most "Deep Learning Processors" right now.

Curious, do you know alternative "DSPs" able to achieve similar results for Deep-Learning, Computer Vision or Machine-Learning algorithm acceleration?

I would highly appreciate a real answer, because I intended to buy one of these movidius "sticks". And further accelerate the Laptop with an eGPU.

nl · on Aug 29, 2017

It's really hard to compare Movidius with anything else because there are so few published benchmarks.

A few notes though:

Movidius is inference only. That might be useful if you have the specific requirements that needs low power, high speed inference and also somehow has to have a x86 CPU.

If you want high speed, low power inference and don't need x86 then the NVidia Jetson wipes the floor with it.

If you want low speed (~2 inference/second), low power then RPi is a good option.

If you want high speed, you need GPU(s).

FractalNerve · on Aug 29, 2017

Thank you for the great answer Nick, that's the best summary I could've hoped for!

So in summary:

⇒ low power, high speed inference on x86 CPU 🡺 Intel Movidius

⇒ low power, high speed inference non x86 🡺 NV Jetson

⇒ low power, low speed inference 🡺 RPi and similar

⇒ high power, high speed inference 🡺 GPU(s)

NVidia Jetson is the obvious winner, if there is no viable alternative, however Jetson is quite expensive and therefore I can't go that route. I want to speed up training & inference. eGPUs are more or less affordable, but low power training & inference at medium or low-cost would strike me as a clear winner.

Sorry for the late answer, nonetheless, even though I didn't get an alternative DSP that offers similar advantages as Movidius, I'm still grateful for your insightful comment.

nl · on Aug 30, 2017

Yes, GPUs (or cloud) are the way to go at the moment if you have any training and want it quick.

HOWEVER, If you are careful, for some models you can get cost benefits by training on cloud CPUs. See http://minimaxir.com/2017/07/cpu-or-gpu/

aseipp · on Aug 29, 2017

The Movidius Stick supports the Raspberry Pi now as a deployment (not development) option, so you can get low power and high speed inference on a single RPi. I have one of these on my desk and they're neat but I haven't done much with it yet. I did grab one because I had heard it would support RPis, though.

Granted, it's 2x the price of a Rpi 3. All together that's about $100 USD. And NVidia just announced the TX1 SE Devkit at $200 USD. I have a TX2, but the TX1 will definitely do better at a higher power/size profile.

The MCS only supports Caffe as well, while the TX1/TX2 will support a wider array of DL frameworks (as well as FP16 support since it's a Tegra).

nl · on Aug 29, 2017

Interesting. I didn't realize they would work on a RPi.

Movidius have promised TF support I think.

deepnotderp · on Aug 29, 2017

Also, what's your use case? That differs significantly, because IIRC, the Movidius chip only supports inference, so if you want to train models too, then that's a non-starter.

deepnotderp · on Aug 28, 2017

I'm very biased here, don't ask me on that :)

That being said, I think pretty much everything I say (admittedly with some hyperbole) is backed up by data.

So I would recommend at the moment, for a shipping processor, get a GPU.

mv4 · on Aug 28, 2017

I was just going to comment on that. Just recently, Intel quietly killed Edison, Joule, Galileo.

40acres · on Aug 28, 2017

I definitely get the sentiment that Intel may pull the rug under this product like they have in the maker space. But Intel spent billions to acquire Movidius and this is a space where they believe the business needs to pivot to if Intel is going to grow. It's hard to see Intel dropping VPUs or Altera FPGAs soon without acknowledgement that the future of the company is in deep peril.

agumonkey · on Aug 28, 2017

These aren't the same crowd though. NN/ML is new, hot, technical .. SBC and embedded on the other hand ..

pjmlp · on Aug 28, 2017

Yet it is the same management board.

schappim · on Aug 29, 2017

Intel making neural compute engines isn't anything new. They've previously had them within the Intel Curie chips, such as those found on the Arduino 101.

MrBuddyCasino · on Aug 29, 2017

Yeah, but nobody aver seemed to figure out what to use them for it seems. Probably too limited in its capabilities (128 neurons with 128 bytes each imho).

pyladune · on Aug 29, 2017

do that really brings power ??

valarauca1 · on Aug 28, 2017

If this is anything like the Xeon Phi its Intel trying to push their own Float16 standard. We all saw how well Float80 worked out when will Intel stop with arbitrary custom standards.

scottlegrand2 · on Aug 28, 2017

Not to defend Intel because I think Intel has become a Ship of Fools With a lot of money, but AMD has FP24, Nvidia has FP16 and Google has TPU 2 with an FP16 implementation so proprietary they can't talk about it. I think the valid concern here is that if this doesn't take off immediately they will end-of-life it and not so much the floating point format.

borramakot · on Aug 28, 2017

I won't defend FP24, but FP16 is a widely supported IEEE standard.

https://en.wikipedia.org/wiki/IEEE_754

dogma1138 · on Aug 28, 2017

Not exactly, IEEE 754 defines very specific rules (e.g. what operations can be done, order of operations and how the results are handled), it's basically exists so that if the same calculation is done on different hardware the result will be the same (or within a given margin of error since even IEEE 754 isn't 100% strict).

For the most part when you have hardware that supports one of the binary standards say FP32 you'll have a compliance sheet, more often than not it will not be 100% compliant for example NVIDIA GPUs were not IEEE 754 compliant when Tesla came out, 2nd Generation Tesla cards were FMA compliant, div and sqrt operations were not IEEE 754 compliant until Fermi.

So the GP was correct, back in the old DX9 days when AMD went with FP24 and NVIDIA with FP16/32 both implementations were proprietary, and this is still the case, they just often offer a 754 compliant mode. You can disable 754 compliance to run faster (sometimes considerably so) e.g. in CUDA you can use the NVCC flag --use_fast_math to do so.

justincormack · on Aug 28, 2017

FP16 is an interchange format in IEEE not a computer format like it is being used here. The idea is you can compress for storage into 16 bits

deepnotderp · on Aug 28, 2017

Ah, someone who agrees with me that Intel has more money than sense :)

But yes, I agree, that criticism is stupid. In fact, I would argue that if you use the IEEE standard for a deep learning processor, you're the one that's stupid.