*Oh, and have a backup workflow. Find / test / use other LLMs and providers. Don...

LTL_FTC · 2026-01-23T19:12:02 1769195522

I mean, you could put together a cluster of dgx sparks (8 of them) and hit 100tps with high concurrency:

https://forums.developer.nvidia.com/t/6x-spark-setup/354399/...

Or a single user at about 10tps.

This is probably around $30k if you go with the 1tb models.

bayindirh · 2026-01-23T19:45:38 1769197538

I'd love more people to try to enable local LLMs at the speeds they wish to use and face the music of the fans, heat and power bills.

When people talk about the cost and requirements of AI, other people can't grasp what they are talking about.

CamperBob2 · 2026-01-23T19:43:01 1769197381

10 tps, maybe, given the Spark's hobbled memory bandwidth. That's too slow, though. That thread is all about training, which is more compute-intensive.

A couple of DGX Stations are more likely to work well for what I have in mind. But at this point, I'd be pleasantly surprised if those ever ship. If they do, they will be more like $200K each than $100K.

LTL_FTC · 2026-01-23T22:53:32 1769208812

I linked results where the user ran Kimi k2 across his 8-node cluster. Inference results are listed for 1,10,100 concurrent requests.

Edit to add:

Yeah, those stations with the GB300 look more along the lines of what I would want as well but I agree, they’re probably way beyond my reach.

pstuart · 2026-01-23T17:17:34 1769188654

I'm hoping that advances in MoE and other improvements in LLMs will translate to allowing self-hosting to cover a good chunk of developer needs, with extending out to providers when it needs more horsepower.

In effect like traditional on-prem services that have cloud services to handle peak loads...

The tech is still relatively new and there's bound to be changes that can enable this -- just like how we went from 8088 to 386 (six years later). That was a ground breaking change and while Moore's law may be dead I expect the cost to drop significantly over time.

One can dream at least.