In April 2000, Stack Overflow founder Joel Spolsky published an essay titled “Things You Should Never Do, Part I”. The occasion was Netscape’s decision to rewrite the code of its then-browser from scratch instead of further developing the existing disciplin
Spolsky’s thesis: This decision was the worst strategic mistake a software company could make. It was based on a fundamental misunderstanding of what programming work actually is. Programmers prefer to rewrite because reading foreign code is tedious and writing feels productive. But this impression is deceptive. The temptation to start over is one of the most expensive temptations in the industry.
More than 25 years later, this text has lost none of its relevance. On the contrary. With the advent of Large Language Models (LLM), the asymmetry between writing and reading has shifted to such an extent that the question becomes acute whether we are systematically underestimating the true senior discipline of software development. Typing is not what will keep teams breathless in the coming years. Reading is. In this article, I want to show why this is the case, where the asymmetry comes from, how it is exacerbated by generative AI, and how a casual accompanying skill must become an independent discipline.
“Looks at AI generated docs in a legacy project”
… Yea…
READING and then correcting these would be the play. But if a change is not seen on the prod UI it’s meaningless.
Curious. How well do llms parse code to extract what it does?
Significantly better than me. At is current state it mostly lacks contextual learning and the path to correct an error. Give it scoped will defined task in a decent codebase and you won’t stand a chance in most well documented industry.
The rub is, when it’s wrong, it tends to be way off and you need to spot it.
And that “wrongness” inevitably will be some monkey’s paw understanding that is very subtle. Like, the code will be “correct”, in the same way a drive-by PR will compile. I’ve had more than a few of these moments and it is very frustrating. Correcting the mistake is also very tedious because, by default, the models leave tombstones that poison the context (ie. when you tell it something is wrong, it will correct it and leave a comment capturing the mistake, which almost guarantees it will make the same mistake again). And then your context is fucked and need to start a new thread which is €€€ because it will want to reread all the damn code again unless you’re diligent about stepping backward.
50% impressive, 50% bullshit. And to spot the bullshit, you need to understand it yourself in detail. To understand it, you need to read the code.
As with other stuff, it is more work to review this, than to read it yourself.
And the whole point of the article is: If you let an LLM “read” code, this does not buy you understanding.
Like hiring someone to go to the gym and lift weights for you.
Since probably Claude 4.5 came out, pretty well as long as the code is already in a fairly good state, in a popular language, and well tested.
If you point it at some ancient spaghetti, results vary significantly, particularly with dumb “tell me what this does” style prompts
Really well. Parsing code is easy and current LLMs have huge context windows and can parse projects in seconds.
I assume it works better the more local and isolated the code is, since every model only has finite attention heads, giant interleaving programs probably lead to mistakes.
Even with good codebases it feels like they often just consider small parts and reinvent the wheel.
attention
You should not use terms that describe human cognitive processes to talk about what these systems do, because this is misleading. They don’t think, they can’t think. In the same way that a book does not memorizes or forgets stuff, and also isn’t itself intelligent, even if written by Einstein, or a compiler is angry about your syntax errors.
It’s the term used in the field not my invention.
See a!so: https://feddit.org/post/30127318


