This post is about training, not inference. The lesson here is that you can't us...

grim_io · 2025-08-14T13:39:08 1755178748

It's always a question of "compared to what?"

Local models are no where near capable compared to frontier big models.

While a small model might be fine for your use case, it can not replace Sonnet-4 for me.

simonw · 2025-08-14T14:28:35 1755181715

Sure, Qwen-3-4B - a 4GB download - is nowhere near as capable as Claude Sonnet 4.

But it is massively more capable than the 4GB models we had last year.

Meanwhile recent models that are within the same ballpark of capabilities as Claude Sonnet 4 - like GLM 4.5 and Kimi K2 and the largest of the Qwen 3 models - can just about fit on a $10,000 512GB of RAM Mac Studio. That's a very notable trend.

grim_io · 2025-08-14T14:47:35 1755182855

It doesn't feel like that the gap is closing at all.

The local models can get 10x as good next year, it won't matter to me if the frontier models are still better.

And just because we can run those models (heavily quantized, and thus less capable), they are unusably slow on that 10k dead weight hardware.

badsectoracula · 2025-08-14T18:58:57 1755197937

El Capitan being much faster than my desktop doesn't mean that my desktop is useless. Same with LLMs.

I've been using Mistral Small 3.x for a bunch of tasks on my own PC and it has been very useful, especially after i wrote a few custom tools with llama.cpp to make it more "scriptable".

jdjdndndn · 2025-08-14T22:46:05 1755211565

I would be interested in hearing about those custom tools