I'm not so sure. IMO agents are actually a huge unlock for low code tools because before you had to teach a disinterested human how to use your new DSL/tool. But Agents are a lot more patient and enthusiastic. So you can have the agent generate the low-code instead of the human.
You could try to generate the business tools straight from the conventional toolsets but the problem is that agents are still far to unreliable for that. However, just like humans, if you dumb down the space and give them a smaller, simpler, set of primitives - they can do a lot better.
The idea that "Now that AI can churn out massive amounts of code quickly and for little cost, we should just forget trying to minimize the amount of code because code is now basically free." Is magical thinking which opposes what is actually happening.
The key insight that's missing is that code creation is the cheapest aspect of software development; reading the code, maintaining the code and adapting the code to new requirements is by far the most difficult and time-consuming part and the speed of code creation is irrelevant there. The smallest trade-off which compromises quality and future-proofing of the code is going to cost multiples the next time you (or the LLM) needs to look at it.
People with industry experience know very well what happened when companies hired developers based on their ability to churn out a large volume of code. Over time, these developers start churning out more and more code, at an accelerating rate; creating an illusion of productivity from the perspective of middle-managers, but the rate of actual new feature releases grinds to a halt as the bug rate increases.
With AI, it's going to be the same effect, except MUCH worse and MUCH more obvious. I actually think that it will get so bad that it will awaken people who weren't paying attention before.
I totally agree with this and with the comment above yours in regards to predictability. I don't understand this manufactured FUD the linked article or the low code spreadsheet product in another comment are creating here. It is literally the perfect match.
The problem is that false positives can be incredibly expensive in money, time, pain, and anxiety. Most people cannot afford (and healthcare system cannot handle) thousands of dollars in tests to disprove every AI hunch. And tests are rarely consequence free. This is effectively a negative externality of these AI health products and society is picking up the tab.
This is why certain types of cancer tests are usually only performed on people over a certain age. If you test young people the false positives outnumber the true positives.
What would a good coding model to run on an M3 Pro (18GB) to get Codex like workflow and quality? Essentially, I am running out quick when using Codex-High on VSCode on the $20 ChatGPT plan and looking for cheaper / free alternatives (even if a little slower, but same quality). Any pointers?
Nothing. This summer I set up a dual 16GB GPU / 64GB RAM system and nothing I could run was even remotely close. Big models that didn't fit on 32gb VRAM had marginally better results but were at least of magnitude slower than what you'd pay for and still much worse in quality.
I gave one of the GPUs to my kid to play games on.
I'm running unsloth/GLM-4.7-Flash-GGUF:UD-Q8_K_XL via llama.cpp on 2x 24G 4090s which fits perfectly with 198k context at 120 tokens/s – the model itself is really good.
Short answer: there is none. You can't get frontier-level performance from any open source model, much less one that would work on an M3 Pro.
If you had more like 200GB ram you might be able to run something like MiniMax M2.1 to get last-gen performance at something resembling usable speed - but it's still a far cry from codex on high.
at the moment, I think the best you can do is qwen3-coder:30b -- it works, and it's nice to get some fully-local llm coding up and running, but you'll quickly realize that you've long tasted the sweet forbidden nectar that is hosted llms. unfortunately.
They are spending hundreds of billions of dollars on data centers filled with GPUs that cost more than an average car and then months on training models to serve your current $20/mo plan. Do you legitimately think there's a cheaper or free alternative that is of the same quality?
I guess you could technically run the huge leading open weight models using large disks as RAM and have close to the "same quality" but with "heat death of the universe" speeds.
Not sure if it's me but at least for my use cases (software devl, small-medium projects) Claude Opus + Claude Code beats by quite a margin OpenCode + GLM 4.7. At least for me Claude "gets it" eventually while GLM will get stuck in a loop not understanding what the problem is or what I expect.
Right, GLM is close But not close enough. If I have to spend $200 for Opus fallback i may as well not use it always. Still an unbelievable option if $200 is a luxury, the price-per-quality is absurd.
"run" as in run locally? There's not much you can do with that little RAM.
If remote models are ok you could have a look at MiniMax M2.1 (minimax.io) or GLM from z.ai or Qwen3 Coder. You should be able to use all of these with your local openai app.
IMO SRE works mostly because they exist outside the product engineering organization. They want to help you succeed but if you want to YOLO your launch and move fast and break things they have the option to hand back the pager and find other work. That option is rarely used but the option alone seems to create better than usual incentives.
With Vibecoding I imagine the LLM will get a MCP that allows them to schedule the jobs on Kubernetes or whatever IaaS and a fleet of agents will do the basic troubleshooting or whackamole type activities, leaving only the hard problems for human SRE. Before and after AI, the corporate incentives will always be to ship slop unless there is a counterbalancing force keeping the shipping team accountable to higher standards.
The fourth amendment is basically gone at this point. Private companies can harvest location data from phones or facial recognition cameras/license plate readers in public spaces and sell that to entities like Palantir that aggregate it for government use (or for other commercial use). No warrants required, very little oversight (especially in this admin).
Article is unclear but if this is saying you can't even get basic ADAS w/out $99/mo that is a pretty big deal, especially if it's applied to existing cars.
Basic stay-in-line and start/stop following in traffic has become pretty standard for almost a decade at this point and paywalling it now would be outrageous. I have a 2017 car that does this.
According to other articles adaptive cruise control (which works down to 0 mph in stop and go traffic) is being kept standard. It is just the rest of Autopilot that is moving to a subscription.
Do you have a source for that? I'm not aware of any regulation requiring ADAS. Even automatic emergency breaking is not yet required for a few more years.
It is in the EU but in the US ADAS won't be mandated until 2029. It would tank your IIHS rating though and all major mfgs have met a voluntary pledge to have >95% light duty vehicles ship with autobraking by 2023: https://www.iihs.org/news/detail/automakers-fulfill-autobrak...
* value seems highly concentrated in a sliver of tasks - the top ten accounting for 32%, suggesting a fat long-tail where it may be less useful/relevant.
* productivity drops to a more modest 1-1.2% productivity gain once you account for humans correcting AI failure. 1% is still plenty good, especially given the historical malaise here of only like 2% growth but it's not like industrial revolution good.
* reliability wall - 70% success rate is still problematic and we're getting down to 50% with just 2+ hours of task duration or about "15 years" of schooling in terms of complexity for API. For web-based multi-turn it's a bit better but I'd imagine that would at least partly due to task-selection bias.
I've found that architecting around that reliability wall is where the margins fall apart. You end up chaining verification steps and retries to get a usable result, which multiplies inference costs until the business case just doesn't work for a bootstrapped product.
I mean you can compare, but at the start it was also super small improvements.
The main difference is that people had no idea of the disruption it would cause and of course there wasn't there a huge investment industry around it.
The only question is about ROI of the investors will be positive (which depends on the timeline), not whether it is disruptive (or it will be after for example 30 years from now), and I see people confusing the two here quite often.
Some of these are acquires so it's hard to argue the money was 'burned'. That said I do agree acquired companies can be viable as the product usually gets put on life-support or ruined/shutdown (like Club Penguin)
Even the products got eventually shut down I still don't think the money was necessarily 'burned.' Most buildings eventually fell or got destructed so were all the resources spent on construction burned? But whether a product actually helped the users is a question too nuance to ask.
reply