Am I the only one thinking that NVIDIA doesn't *really* have a moat here? How ma...

fulafel · on May 27, 2023

CUDA lock in, and network effect, is the main part I think. Even though other vendors can build CUDA compatibility (like AMD did) the quality is likely to keep trailing NVidia. Plus the datacenter TPU market is not yet really formed, even though they get better perf/$ and better perf/watt.

On the other hand it's cool to see that programming language tech as the keystone but on the other hand it's frustrating and tragic that the whole software stack and dev exp landscape is so crap in GPU/TPU land and the bar is so low that you NV win with a hard to use proprietary C++ based language and preside over a fragmented landscape of divided and conquered competition. Makes you wish the Intel Larrabee etc open platform direction had won out.

jiggawatts · on May 27, 2023

> CUDA lock in

The amount of software written for CUDA pales in comparison to the amount that has been written for Intel x86, yet two large companies migrated off it.

The lock-in with Intel was due to binary distribution (Windows software), and binary ABIs.

Everything to do with large language models is compiled from scratch, using high level languages.

The LLM codes themselves themselves are trivial things, easily replicated in a matter of hours on some other platforms. The hard part is gathering the training data and the compute.

The hard parts are not dependent on CUDA.

Look at it this way: Google developed Transformers and trained multiple LLMs using TPUs, not CUDA GPUs! The reason their LLMs are stupid is not because of the TPUs.

fulafel · on May 27, 2023

x86 is a binary artifact instruction set, and software is written in higher level languages that can be recompiled. CUDA is a language that can target multiple ISAs. Microsoft and apple had incentive to keep options open to migrate off x86 and give a lot of support to their users doing so, and provide backwards compatibility in form of emulation etc. NVidia does this even better, users don't even notice when they are switching GPU ISAs.

In principle it's easy to recompile CPU side stuff too, but there are 3rd party component ecosystems, other dependency quirks, familiarity with the platform, familiar well known performance properties, sharing the same questions and answers on stackoverflow, etc. The lock-in network effect can be strong even if its individual contributing factors

I agree it's less locked in than eg Intel had it in the heyday. Ultimately we'll find out when the competition eventually shows up with reasons to consider switching.