I’m happy to throw an LLM at our projects but we also spend time refactoring and reviewing each other’s code. When I look at the AI-generated code I can visualize the direction it’s headed in—lots of copy-pasted code with tedious manual checks for specific error conditions and little thought about how somebody reading it could be confident that the code is correct.
I can’t understand how people would run agents 24/7. The agent is producing mediocre code and is bottlenecked on my review & fixes. I think I’m only marginally faster than I was without LLMs.
> with tedious manual checks for specific error conditions
And specifically: Lots of checks for impossible error conditions - often then supplying an incorrect "default value" in the case of those error conditions which would result in completely wrong behavior that would be really hard to debug if a future change ever makes those branches actually reachable.
I always thought that the vast majority of your codebase, the right thing to do with an error is to propagate it. Either blindly, or by wrapping it with a bit of context info.
I don’t know where the LLMs are picking up this paranoid tendency to handle every single error case. It’s worth knowing about the error cases, but it requires a lot more knowledge and reasoning about the current state of the program to think about how they should be handled. Not something you can figure out just by looking at a snippet.
Training data from junior programmers or introductory programming teaching material. No matter how carefully one labels data, the combination of programming’s subjectivity (damaging human labeling and reinforcement’s effectiveness at filtering around this) and the sheer volume of low-experience code in the input corpus makes this condition basically inevitable.
Garbage in garbage out as they say. I will be the first to admit that Claude enables me to do certain things that I simply could not do before without investing a significant amount of time and energy.
At the same time, the amount of anti-patterns the LLM generates is higher than I am able to manage. No Claude.md and Skills.md have not fixed the issue.
Building a production grade system using Claude has been a fools errand for me. Whatever time/energy i save by not writing code - I end up paying back when I read code that I did not write and fixing anti-patterns left and right.
I rationalized by a bit - deflecting by saying this is AI's code not mine. But no - this is my code and it's bad.
> At the same time, the amount of anti-patterns the LLM generates is higher than I am able to manage. No Claude.md and Skills.md have not fixed the issue.
This is starting to drive me insane. I was working on a Rust cli that depends on docker and Opus decided to just… keep the cli going with a warning “Docker is not installed” before jumping into a pile of garbage code that looks like it was written by a lobotomized kangaroo because it tries to use an Option<Docker> everywhere instead of making sure its installed and quitting with an error if it isn’t.
What do I even write in a CLAUDE.md file? The behavior is so stupid I don’t even know how to prompt against it.
> I don’t know where the LLMs are picking up this paranoid tendency to handle every single error case.
Think about it, they have to work in a very limited context window. Like, just the immediate file where the change is taking place, essentially. Having broader knowledge of how the application deals with particular errors (catch them here and wrap? Let them bubble up? Catch and log but don't bubble up?) is outside its purview.
I can hear it now, "well just codify those rules in CLAUDE.md." Yeah but there's always edge cases to the edge cases and you're using English, with all the drawbacks that entails.
I have encoded rules against this in CLAUDE.md. Claude routinely ignores those rules until I ask "how can this branch be reached?" and it responds "it can't. So according to <rule> I should crash instead" and goes and does that.
The answer (as usual) is reinforcement learning. They gave ten idiots some code snippets, and all of them went for the "belt and braces" approach. So now thats all we get, ever. It's like the previous versions that spammed emojis everywhere despite that not being a thing whatsoever in their training data. I don't think they ever fixed that, just put a "spare us the emojis" instruction in the system prompt bandaid.
This is my biggest frustration with the code they generate (but it does make it easy to check if my students have even looked at the generated code). I dont want to fail silently or hard code an error message, it creates a pile of lies to work through for future debugging
Writing bad tests and error handling have been the worst performance part of Claude for me.
In particular writing tests that do nothing, writing tests and then skipping them to resolve test failures, and everybody's favorite: writing a test that greps the source code for a string (which is just insane, how did it get this idea?)
Seriously. Maybe 60% of the time I use claude for tests, the "fix" for the failing tests is also to change the application code so the test passes (in some cases it will want to make massive architecture changes to accomodate the test, even if there's an easy way to adapt the test to better fit the arch). Maybe half the time that's the right thing to do, but the other half the time it is most definitely not. It's a high enough error rate that it borderlines on useful.
Usually you want to fix the code that's failing a test.
The assumption is that your test is right. That's TDD. Then you write your code to conform to the tests. Otherwise what's the point of the tests if you're just trying to rewrite them until they pass?
I can’t understand how people would run agents 24/7. The agent is producing mediocre code and is bottlenecked on my review & fixes. I think I’m only marginally faster than I was without LLMs.