Am I the only one thinking that NVIDIA doesn't really have a moat here?
How many A100 or H100 cards are actually manufactured annually? A few hundred thousand, if that?
Suddenly, there's a big demand. Microsoft mentioned buying something like 25,000 of the H100 cards for GPT-4 and ongoing training. I'm certain they're not paying retail pricing, so that's a few hundred million in revenue for NVIDIA. They're probably the biggest single customer right now, except perhaps for Amazon.
NVIDIA's revenue in 2022 was $27 billion. The additional H100 cards they've sold this year is a fraction of that. Their retail prices have spiked and availability has dropped because supply is inelastic and there aren't any other suppliers with equivalent products... yet.
Fundamentally, a H100 is not that different to a desktop GPU! It's a little bigger, the math units have a different ALU balance, and they use high-bandwidth memory (HBM), but that's it. There's nothing else really special about them. Unlike a CPU, which is extremely complex, a GPU is a relatively simple unit repeated over 10K times. In some sense, it's a copy-paste exercise.
NVIDIA has a tiny moat, because AMD simply didn't bother to go after what was -- until now -- a relatively small market.
That market is going to be huge, but that invites competition! When tens or even hundreds of billions are on the table, you can bet your bottom dollar that AMD, Intel, Google, and even Facebook won't sit idly by and watch NVIDIA walk off with it.
So what moat does NVIDIA have?
CUDA is like assembly language. PyTorch can target other back-ends. Compilers can target other GPU instructions sets. Throw a billion dollars at this, and it suddenly becomes an eminently solvable problem. Just look at Apple's CPU transitions and Amazon rolling out ARM cloud servers.
A card with HBM memory? AMD did that first! They already have the tech.
A GPU + CPU hybrid with unified memory? Both Intel and Apple have either existing or upcoming products. Intel for example just abandoned[1] a HPC CPU design that was a combo of a GPU+CPU surrounded by HBM chips acting as a cache for terabytes of DDR5 memory -- ideal for training or running very large language models!
A GPU with a huge amount of 16-bit, 8-bit, or 4-bit ops/sec? Guess what: this is easier than getting high performance with 64-bit floats! You can literally brute force optimal circuit layouts for 4-bit ALUs. No need to be clever at all. All you need is the ability to manufacture "3nm" chips. TSMC does that, not NVIDIA. Intel and Samsung are catching up, rapidly.
Fundamentally, the programming interface 99% of AI researchers use are high-level languages like Python or maybe C++. Compilers exist. Even with CUDA, diverse instruction sets and capabilities exist.
So.. where's the moat!?
[1] Ooo, I bet they feel real stupid right now for throwing in the towel literally months before the LLM boom started taking off.
CUDA lock in, and network effect, is the main part I think. Even though other vendors can build CUDA compatibility (like AMD did) the quality is likely to keep trailing NVidia. Plus the datacenter TPU market is not yet really formed, even though they get better perf/$ and better perf/watt.
On the other hand it's cool to see that programming language tech as the keystone but on the other hand it's frustrating and tragic that the whole software stack and dev exp landscape is so crap in GPU/TPU land and the bar is so low that you NV win with a hard to use proprietary C++ based language and preside over a fragmented landscape of divided and conquered competition. Makes you wish the Intel Larrabee etc open platform direction had won out.
The amount of software written for CUDA pales in comparison to the amount that has been written for Intel x86, yet two large companies migrated off it.
The lock-in with Intel was due to binary distribution (Windows software), and binary ABIs.
Everything to do with large language models is compiled from scratch, using high level languages.
The LLM codes themselves themselves are trivial things, easily replicated in a matter of hours on some other platforms. The hard part is gathering the training data and the compute.
The hard parts are not dependent on CUDA.
Look at it this way: Google developed Transformers and trained multiple LLMs using TPUs, not CUDA GPUs! The reason their LLMs are stupid is not because of the TPUs.
x86 is a binary artifact instruction set, and software is written in higher level languages that can be recompiled. CUDA is a language that can target multiple ISAs. Microsoft and apple had incentive to keep options open to migrate off x86 and give a lot of support to their users doing so, and provide backwards compatibility in form of emulation etc. NVidia does this even better, users don't even notice when they are switching GPU ISAs.
In principle it's easy to recompile CPU side stuff too, but there are 3rd party component ecosystems, other dependency quirks, familiarity with the platform, familiar well known performance properties, sharing the same questions and answers on stackoverflow, etc. The lock-in network effect can be strong even if its individual contributing factors
I agree it's less locked in than eg Intel had it in the heyday. Ultimately we'll find out when the competition eventually shows up with reasons to consider switching.
How many A100 or H100 cards are actually manufactured annually? A few hundred thousand, if that?
Suddenly, there's a big demand. Microsoft mentioned buying something like 25,000 of the H100 cards for GPT-4 and ongoing training. I'm certain they're not paying retail pricing, so that's a few hundred million in revenue for NVIDIA. They're probably the biggest single customer right now, except perhaps for Amazon.
NVIDIA's revenue in 2022 was $27 billion. The additional H100 cards they've sold this year is a fraction of that. Their retail prices have spiked and availability has dropped because supply is inelastic and there aren't any other suppliers with equivalent products... yet.
Fundamentally, a H100 is not that different to a desktop GPU! It's a little bigger, the math units have a different ALU balance, and they use high-bandwidth memory (HBM), but that's it. There's nothing else really special about them. Unlike a CPU, which is extremely complex, a GPU is a relatively simple unit repeated over 10K times. In some sense, it's a copy-paste exercise.
NVIDIA has a tiny moat, because AMD simply didn't bother to go after what was -- until now -- a relatively small market.
That market is going to be huge, but that invites competition! When tens or even hundreds of billions are on the table, you can bet your bottom dollar that AMD, Intel, Google, and even Facebook won't sit idly by and watch NVIDIA walk off with it.
So what moat does NVIDIA have?
CUDA is like assembly language. PyTorch can target other back-ends. Compilers can target other GPU instructions sets. Throw a billion dollars at this, and it suddenly becomes an eminently solvable problem. Just look at Apple's CPU transitions and Amazon rolling out ARM cloud servers.
A card with HBM memory? AMD did that first! They already have the tech.
A GPU + CPU hybrid with unified memory? Both Intel and Apple have either existing or upcoming products. Intel for example just abandoned[1] a HPC CPU design that was a combo of a GPU+CPU surrounded by HBM chips acting as a cache for terabytes of DDR5 memory -- ideal for training or running very large language models!
A GPU with a huge amount of 16-bit, 8-bit, or 4-bit ops/sec? Guess what: this is easier than getting high performance with 64-bit floats! You can literally brute force optimal circuit layouts for 4-bit ALUs. No need to be clever at all. All you need is the ability to manufacture "3nm" chips. TSMC does that, not NVIDIA. Intel and Samsung are catching up, rapidly.
Fundamentally, the programming interface 99% of AI researchers use are high-level languages like Python or maybe C++. Compilers exist. Even with CUDA, diverse instruction sets and capabilities exist.
So.. where's the moat!?
[1] Ooo, I bet they feel real stupid right now for throwing in the towel literally months before the LLM boom started taking off.