I don't get the obsession some tech people have to push the idea that this stuff accelerate your coding, increase your productivity. It's all about fast and faster output. In my experience LLMs have mostly produced gibberish oververbose code, surely faster than me, but my lower speed usually produce better code. I don't like this present state of things where we need to chat faster to quickly get out results and go fast in production... that is the kind of mentality that pushed subpar products on the web for so many years. Instead of dropping names lile vibe coding/engineering whatever comes next, let's have a serious discussion why we need faster low quality and we can't just improve automation and processes. Eg I can get unit test generated very fast sure, but my question is: why do we need all these unit tests in the first place? Dont get me wrong they're useful, but I feel like we're advancing higher abstraction instead of advancing lower level tools
> that is the kind of mentality that pushed subpar products on the web for so many years
Famously, some of those subpar products are now household names who were able to stake out their place in the market because of their ability to move quickly and iterate. Had they prioritized long-maintainable code quality rather than user journey, it's possible they wouldn't be where they are today.
"Move fast and break things" wasn't a joke; Mark really did encourage people to build faster and it helped cement their positioning. Think of the quantity of features FB shoveled out between 2009-2014 or so, that just wouldn't have been possible if their primary objective was flawless code.
The code isn't the product, the product is the product. In all my years of engineering I've yet to have an end-user tell me they liked my coding style, they've always been more concerned with what I'd built them.
Running a platform where billions of users are able to communicate is pretty technologically marvel.
Lest not forget when Hotz said he could easily fix Twitter's search functionality only to give up after 3 months [1]. When immensen scale is involved things do become difficult.
Like we have a comment below taking a shot at Phillip Morris [2]. Lets see you grow, process, and distribute 1/100 the quality of cigarettes. The end product might not be that great for society but it's not trivial to do it either.
Withdrawal symptoms :P The buyer decides whether their problem is a good one to have and whether the solution is adequate. Even when it's, objectively, not.
Earlier than that, Facebook became ascendent because of quality. It was better than MySpace, the only real competitor at the time. The issue here is Facebook is not primarily a software product. It's a community, and the community was better than MySpace because it was restricted to pre-existing networks rather than taking any comer. I don't think Mark did that on purpose as a calculated decision. He just got lucky. When they eventually opened up and became just as shitty as MySpace had been, they were big enough to simply acquire better products that might have been competitors, and network effects locked them in for probably decades until their users die off and don't get replaced by younger people who never used Facebook.
I don't really see it as an example of what you're saying so much as an example of success as a product having to do with far more than explicit product features. You can see a similar dynamic in other natural monopoly markets. The NBA didn't necessarily do anything particularly right product wise, for instance. They just got the best players because basketball was historically more popular in the US than other countries, and the ABA made some stupid decisions that let the NBA win out in the US.
Hell, the US itself didn't do a whole lot "right" aside from not being in Europe when Europe decided to destroy itself, being better-positioned than other potential competitors like Canada, Mexico, and Australia simply because North America is best positioned to trade with both Europe and Asia and the US is more temperate than Canada or Mexico. But we sure like to tell ourselves stories about everything we did right.
W. Edwards Deming was partially responsible for Japanese manufacturing QA and was a huge player in WWII manufacturing. He was one of the reasons the United States went from annually manufacturing 3k planes to over 300k+ planes by the end of the war, which is pretty crazy to think about. The US did plenty right in WWII but it gets over looked by historians and (I guess) universities.
Yes, both impact each other, and if facebook was just shoddily coupled together it probably wouldn't be a great product, but so many engineers get into this mindset of needing finely tuned clean code vs a working product.
When people were switching, how many decided to use facebook because the coding style of the backend was really clean and had good isomorphism?
There's a balance with this, but so much of it's success was being at the right place at the right time, and if you spend a massive amunt of time not building product and just building good code, you're going to hit a failure.
Total side note, it's interesting seeing "Move fast and break things" become the dominant phrase vs what I remember initially as the "MIT vs New Jersey methods". Both describe the same thing where fast and sloppy wins vs slow and clean.
I think the general issue with software engineering discourse is that while our tools and languages may be the same, there's a massive gradient in tolerance for incorrectness, security, compliance, maintainability.
Some of us are building prototypes, and others are building software with a 10 year horizon or work with sensitive personal data.
So on the one side you get people that are super efficient cranking things out, and others that read this and feel appalled anyone would be comfortable with this and see software engineering as theory building. Neither are really wrong here, but the context / risk of what people are working on always seems to be missing when people talk about this.
I have yet to see in my life a prototypes that doesn't become production :) Btw my whole point wasn't on security and can't find a compelling reason to talk about it, it rather questions "faster results" as "better productivity" it isn't, and imo we should pause for a moment and focus on better tooling
We live in an objective reality. LLM's help a damn lot in speed of development. As someone who has been coding since 5th grade for over 20 years who is known to be a LIGHTNING FAST implementor by many people I have been scaled ridiculously with LLM's. Genie is out of the bag, you have to go with the flow, adapt or else....
I am just as pissed as anybody else at the lack of certainty in the future.... Thought I had my career made. So it's not that I don't emphatize with engineers in my shoes.
Good lord I wish I could have as many certainties as well, one point at a time:
* There is no objective reality, there isn't one in physics, it's just a non argument
* "LLM's help a damn lot in speed of development" That may be your experience and my whole point was arguing that speed may not matter
* "Genie is out of the bag, you have to go with the flow, adapt or else" I choose else
if this the apex of the argument
If you go down in physics you might discover that the reality seems to be composed a lot more by probabilities. An objective reality built on probabilities does not seem so objective afterall.
Here is a fun think to try: demonstrate that you and me are seeing the same color in the same way? You might find some papers trying to prove it but you will se they are all based on subjective answers of people.
So while at macro level the reality seems hard and objective at the micro level it is not.
It's an objective reality. Even at quantum level the known laws of nature hold. Even if there is uncertainty, nature is predictable in following the laws of physics.
And ofc at macro level we live in a very objective reality. This is the basis of science.
> why do we need all these unit tests in the first place?
The same reason we've always needed them:
1: They prevent regressions. (IE, bugs in features that were shipped and already working.)
2: They are very easy to run at the push of a button in your IDE. (But in this context the LLM runs them.)
3: They run in CI. This is an important line of defense in making sure a pull request doesn't introduce a bug.
Now, depending on what you're writing, you might not need unit tests! Perhaps you're trying to get a minimum viable product out the door? Perhaps you're trying to demo a feature to see if it's worth building? Perhaps you're writing a 1-off tool that you'll run a few times and throw away?
But, understand that if you're writing an industrial-strength program, your unit tests help you ship bug-free software. They allow you to do some rather major refactors, sometimes touching areas of the codebase that you only lightly understand, without needing to manually test everything.
(And, to keep it in context,) your LLM will also have the same benefits from this tired-and-true process.
Thanks for the non solicited lesson, I learned basically nothing new. That unit tests ships bug free software is such an overstatement... Btw please go re read my comment and try to understand what I was actually trying to argue about, and I also wrote that I think they can be useful...
You aren’t entertaining the possibility that some experienced engineerings are using these tools to produce incredibly high quality code, while still massively increasing productivity. With good prompting and “vibe engineering” practices, I can assure you: the code I get Claude Code to produce is top notch.
I'm experienced, I don't accept the implication that I might not be able to use these tools are their full potential and you won't convince me only because you mention an anecdotical example
Absolutely but we're talking about structured tools, like a cli, not unstructured non deterministic "agents" that fails to give the same answer twice, ls -la doesn't lie
You also must be very confident in your own ability if you don’t think that at least some of the things you’re doing that you’d classify as “skill at using the tool” aren’t just superstitions à la Skinner’s pigeons.
> One of the more "engineering" like skills in using this stuff is methodically figuring out what's a superstition and what actually works.
The problem is there are so many variables and the system is so chaotic that this is a nearly impossible task for things that don’t have an absolutely enormous effect size.
For most things you’re testing, you need to run the experiment many many times to get any kind is statistically significant result, which rules out manual review.
And since we have tried and failed to develop objective code quality metrics, you’re left with metrics like “does this pass the automated test or not!”, but that doesn’t tell you whether the code is any good, or whether it is overfitting the test suite. Then when a new model comes out, you have to scrap your results and run your experiments all over. This is engineering of the laws of physics were constantly changing, and I lived in that universe, I think I’d take my ball and go home.
There's always been a bit of magic to being a programmer, and if you look at the cover of SICP people like to imagine that they are wizards or alchemists. But "vibe engineering" moves that to a whole new level. You're a wizard mixing up gunpowder and sacrificing chickens to fire spirits before you light it. It's not engineering because unless the models fundamentally change you'll never be able to really sort the science from the superstition. Software engineering already had too much superstition for my taste, but we're at a whole new level now.
Here's an example from today of something I just figured out.
I had Claude Code do some work which I pushed as a branch to GitHub. Then I opened a PR so I could more easily review it and added a bunch of notes and comments there.
On a hunch, I pasted the URL to that PR into Claude Code and said "use the GitHub API to fetch the notes on this PR"...
... and it did exactly that. It guesses the API URL, fetched the JSON and read my notes back to me.
I told it to address each note in turn and commit the result. It did.
If a future model changes such that it can no longer correctly guess the URL to fetch JSON notes for a GitHub PR I'll notice when this trick fails. For the moment it's something I get to tuck in my ever expanding list of things that Claude (and likely other good models) can do.
How is that an example of something you are doing that might be a superstition?
You asked it to do a single easily verifiable task and it did it. You don’t know whether that’s something it can do reliably until you test it sure.
An example of a possible superstitious action would be always adding commands as notes in a PR because you believe Claude gives PR notes more weight.
That’s something that sounds crazy, but it’s perfectly believable that some artifact of training could lead some model to actually behave this way. And you can imagine that someone picking up on this pattern could continue to favor writing commands as PR notes years after model changes have removed this behavior.
When I'm working with models I'm always looking for the simplest possible way to express a task. I've never been a fan of the whole "you're a world expert in X", "I'll tip you a million dollars if..." etc school of prompting.
Those are some obvious potential superstitious incantations. They might not be superstitions though. They might actually work. It’s entirely feasible that bribes produce higher quality code. Unfortunately it’s not as easy as avoiding things that sound ridiculous.
The black box, random, chaotic nature of LLMs virtually ensures that you will pick up superstitions even if they aren’t as obvious as the above. Numbered lists work better than bullets. Prompts work better if they are concise and you remove superfluous words. You should reset your context as soon as the agent starts doing x.
All of those things may be true. They may have been true for one model, but not others. They may have never been generally true for any model, but randomness led someone to believe they were.
I just realized I picked up a new superstition quite recently involving ChatGPT search.
I've been asking it for "credible" reports on topics, because when I use that word its thinking trace seems to consider the source of the information more carefully. I've noticed it saying things like "but that's just a random blog, I should find a story from a news organization".
But... I haven't done a measured comparison, so for all I know it has the same taste in sources even if I don't nudge it with "credible" in the mix!
Since you are convinced you’re using the tools to their full potential, the quality problem you experience is 100% the tools fault. This means there is no possible change in your own behavior that would yield better results. This is one of those beliefs that is self fulfilling.
I’ve found it much more useful in life to always assume I’m not doing something to its full potential.
Yes they are. Guide dogs, hunting dogs, sheep dogs. The comparison to LLMs is genuinely useful here, because dogs are unreliable tools that you have to work with over a period of time to figure out.
I've used this argument for real in the past with people who complain that it's unethical to set sightless people up with vision LLM tools because those tools are unreliable and make mistakes. My counter is that a) so are guide dogs and b) it's rude to discount the agency of people with accessibility needs in evaluating and selecting tools for themselves.
No they're not, they're animals as we are, otherwise one would claim that you simon are a tool too. I'm finally starting to understand your world beliefs from post and comments, it's aberrating and dystopian
I'm fine with it. I love dogs, and I find suggestions that LLMs may achieve sentience or become conscious either laughable or abhorrent, depending on how serious the person is who's making them.
It's still OK to use dogs as an analogy. In this case the analogy is to unreliable tools, and dogs are unreliable tools.
I don't find "stochastic parrot" offensive as an analogy, even though it's got parrots in it.
It feels offensive because it's equating sentient life with being a tool used to mostly benefit capitalists. That feels extremely dystopian and somewhat antithetical of what it means to be a human, or at least I hope most humans don't feel their purpose in life is to become a good "widget" on an assembly line.
You see a large % of failures, but you're drawing an unsupported conclusion.
We all agree, the people that _feel_ the most productivity spike are the sub-par engineers. That shouldn't be controversial, and it's even predictable.
But their volume can't be taken as an argument one way or the other.
The question is, are there _any_ good engineers that don't just feel more productive, but objectively are.
People are constantly looking for definite tendencies and magic patterns so they can abdicate situational awareness and critical thinking. We observe that fast delivery has often correlated with success in software and we infer that fast delivery is a factor of success. Then it becomes about the mindless pursuit of the measure, speed of delivery, as Goodhart's law predicts.
Let's even concede that speed of delivery indeed is an actual factor, there has to be a threshold. There's a point where people just don't care how fast you're putting out features, because your product has found its sweet spot and is perfectly scratching its market's itch. A few will clearly notice when the line of diminishing returns is crossed and if they can reset their outlook to fit the new context, a continuous focus on speed of delivery will look increasingly obsessive and nonsensical.
But that's the reality of the majority of the software development world. Few of us work on anything mission critical. We could produce nice sane software at a reasonable pace with decent output, but we're sold the idea that there's always more productivity to squeeze and we're told that we really really want that juice.
That all things develop only so much before they degrade into overdevelopment was a very well understood phenomenon for ancient Taoists, and it will be the death of the modern Blackrock/Vanguard owned world which is absolutely ignorant of this principle.
I'll attempt to provide a reasonable argument for why speed of delivery is the most important thing in software development. I'll concede that I don't know if the below is true, and haven't conducted formal experiments, and have no real-world data to back up the claims, nor even define all the terms in the argument beyond generally accepted terminology. The premise of the argument therefore may be incorrect.
Trivial software is software for which
- the value of which the software solution is widely accepted and widely known in practice and
- formal verification exists and is possible to automate or
- only has a single satisfying possible implementation.
Most software is non-trivial.
There will always be:
- bugs in implementation
- missed requirements
- leaky abstractions
- incorrect features with no user or business value
- problems with integration
- problems with performance
- security problems
- complexity problems
- maintenance problems
in any non-trivial software no matter how "good" the engineer producing the code is or how "good" the code is.
These problems are surfaced and reduced to lie within acceptable operational tolerances via iterative development. It doesn't matter how formal our specifications are or how rigorous our verification procedures are if they are validated against an incorrect model of the problem we are attempting to solve with the software we write.
These problems can only be discovered through iterative acceptance testing, experimentation, and active use, maintenance, and constructive feedback on the quality of the software we write.
This means that the overall quality of any non-trivial software is dominated by the total number of quality feedback loops executed during its lifetime. The number of feedback loops during the software's lifetime are bound by the time it takes to complete a single synchchronous feedback loop. Multiple feedback loops may be executed in parallel, but Amdahl's law holds for overall delivery.
Therefore, time to delivery is the dominant factor to consider in order to produce valuable software products.
Your slower to produce, higher quality code puts a boundary on the duration of a single feedback loop iteration. The code you produce can perfectly solve the problem as you understand it within an iteration, but cannot guarantee that your understanding of the problem is not wrong. In that sense, many lower quality iterations produces better software quality as the number of iterations approaches infinity.
>> Your slower to produce, higher quality code puts a boundary on the duration of a single feedback loop iteration. The code you produce can perfectly solve the problem as you understand it within an iteration, but cannot guarantee that your understanding of the problem is not wrong. In that sense, many lower quality iterations produces better software quality as the number of iterations approaches infinity.
I'll reply just to that as it being the tldr. First of all tech debt is a thing and it's the thing that accumulates mostly thanks to fast feedback iterations. And in my experience the better the comunication, to get the implementation right, and the better the implementation and it happens that you can have solid features that you'll unlikely ever touch again, user base habit is also a thing, continuing on interating on something a user knows how to use and changing it is a bad thing. I'd also argue it's bad product/project management. But my whole original argument was why we'd need to have a greater speed in the first place, better tooling doesn't necessarily means faster output, productivity as well isn't measured as just faster output. Let me make a concrete example, if you ask an LLM X to produce a UI with some features, most of them will default to using React, why? Why can't we question the current state of web instead of continue to pile up abstractions over abstractions? Even if I ask the LLM to create a vanilla web app with HTML, why can't we have better tooling for sharing apps over the internet? The web is stagnant and instead of fixing it we're building castles over castles over it
Tech debt doesn't accrue because of fast feedback iterations. Tech debt accrues because it isn't paid down or is unrecognized during review. And like all working code, addressing it has a cost in terms of effort and verification. When the cost is too great, nobody is willing to pay it. So it accrues.
There aren't many features that you'll never touch again. There are some, but they usually don't really reach that stage before they are retired. Things like curl, emacs, and ethernet adapters still exist and are still under active development after existing for decades. Sure, maybe the one driver for an ethernet adapter that is no longer manufactured isn't very active, but adding support for os upgrades still requires maintenance. New protocols, encryption libraries and security patches have to be added to curl. emacs has to be specially maintained for the latest OSX and windows versions. Maintenance occurs in most living features.
Tools exist to produce extra productivity. Compilers are a tool so that we don't have to write assembly. High-level interpreted languages are a tool so we don't have to write ports for every system. Tools themselves are abstractions.
Software is abstractions all the way down. Everything is a stack on everything else. Including, even, the hardware. Many are old, tried and true abstractions, but there are dozens of layers between the text editor we enter our code into and the hardware that executes it. Most of the time we accept this, unless one of the layers break. Most of the time they don't, but that is the result of decades of management and maintenance, and efforts sometimes measured in huge numbers of working hours by dozens of people.
A person can write a rudimentary web browser. A person cannot write chrome with all its features today. The effort to do so would be too great to finish. In addition, if finished, it would provide little value to the market, because the original chrome would still exist and have gained new features and maintenance patches that improve its behavior from the divergent clone the hypothetical engineer created.
LLMs output react because react dominates their training data. You have to reject their plan and force them to choose your preferred architecture when they attempt to generate what you ask, but in a different way.
We can have better tooling for sharing apps than the web. First, it needs to be built. This takes effort, iteration, and time.
Second, it needs to be marketed and gain adoption. At one time, Netflix and the <blink> tag it implented dominated the web. Now it is a historical footnote.Massive migrations and adoptions happen.
Build the world you want to work in. And use the tools you think make you more productive. Measure those against new tools that come along, and adopt the ones that are better. That's all you can do.