NHacker Next
login
▲Generative AI coding tools and agents do not work for meblog.miguelgrinberg.com
279 points by nomdep 12 hours ago | 326 comments
Loading comments...
socalgal2 7 hours ago [-]
> Another common argument I've heard is that Generative AI is helpful when you need to write code in a language or technology you are not familiar with. To me this also makes little sense.

I'm not sure I get this one. When I'm learning new tech I almost always have questions. I used to google them. If I couldn't find an answer I might try posting on stack overflow. Sometimes as I'm typing the question their search would finally kick in and find the answer (similar questions). Other times I'd post the question, if it didn't get closed, maybe I'd get an answer a few hours or days later.

Now I just ask ChatGPT or Gemini and more often than not it gives me the answer. That alone and nothing else (agent modes, AI editing or generating files) is enough to increase my output. I get answers 10x faster than I used to. I'm not sure what that has to do with the point about learning. Getting answers to those question is learning, regardless of where the answer comes from.

plasticeagle 6 hours ago [-]
ChatGPT and Gemini literally only know the answer because they read StackOverflow. Stack Overflow only exists because they have visitors.

What do you think will happen when everyone is using the AI tools to answer their questions? We'll be back in the world of Encyclopedias, in which central authorities spent large amounts of money manually collecting information and publishing it. And then they spent a good amount of time finding ways to sell that information to us, which was only fair because they spent all that time collating it. The internet pretty much destroyed that business model, and in some sense the AI "revolution" is trying to bring it back.

Also, he's specifically talking about having a coding tool write the code for you, he's not talking about using an AI tool to answer a question, so that you can go ahead and write the code yourself. These are different things, and he is treating them differently.

socalgal2 5 hours ago [-]
> ChatGPT and Gemini literally only know the answer because they read StackOverflow. Stack Overflow only exists because they have visitors.

I know this isn't true because I work on an API that has no answers on stackoverflow (too new), nor does it have answers anywhere else. Yet, the AI seems to able to accurately answer many questions about it. To be honest I've been somewhat shocked at this.

gwhr 19 minutes ago [-]
What kind of API is it? Curious if it's a common problem that the AI was able to solve?
bbarnett 4 hours ago [-]
It is absolutely true, and AI cannot think, reason, comprehend anything it has not seen before. If you're getting answers, it has seen it elsewhere, or it is literally dumb, statistical luck.

That doesn't mean it knows the answer. That means it guessed or hallucinated correctly. Guessing isn't knowing.

edit: people seem to be missing my point, so let me rephrase. Of course AIs don't think, but that wasn't what I was getting at. There is a vast difference between knowing something, and guessing.

Guessing, even in humans, is just the human mind statistically and automatically weighing probabilities and suggesting what may be the answer.

This is akin to what a model might do, without any real information. Yet in both cases, there's zero validation that anything is even remotely correct. It's 100% conjecture.

It therefore doesn't know the answer, it guessed it.

When it comes to being correct about a language or API that there's zero info on, it's just pure happenstance that it got it correct. It's important to know the differences, and not say it "knows" the answer. It doesn't. It guessed.

One of the most massive issues with LLMs is we don't get a probability response back. You ask a human "Do you know how this works", and an honest and helpful human might say "No" or "No, but you should try this. It might work".

That's helpful.

Conversely a human pretending it knows and speaking with deep authority when it doesn't is a liar.

LLMs need more of this type of response, which indicates certainty or not. They're useless without this. But of course, an LLM indicating a lack of certainty, means that customers might use it less, or not trust it as much, so... profits first! Speak with certainty on all things!

demosthanos 14 minutes ago [-]
This is wrong. I write toy languages and frameworks for fun. These are APIs that simply don't exist outside of my code base, and LLMs are consistently able to:

* Read the signatures of the functions.

* Use the code correctly.

* Answer questions about the behavior of the underlying API by consulting the code.

Of course they're just guessing if they go beyond what's in their context window, but don't underestimate context window!

bbarnett 12 minutes ago [-]
So, you're saying you provided examples of the code and APIs and more, in the context window, and it succeeds? That sounds very much unlike the post I responded to, which claimed "no knowledge". You're also seemingly missing this:

"If you're getting answers, it has seen it elsewhere"

The context window is 'elsewhere'.

lechatonnoir 4 hours ago [-]
This is such a pointless, tired take.

You want to say this guy's experience isn't reproducible? That's one thing, but that's probably not the case unless you're assuming they're pretty stupid themselves.

You want to say that it Is reproducible, but that "that doesn't mean AI can think"? Okay, but that's not what the thread was about.

PeterStuer 4 hours ago [-]
What would convince you otherwise? The reason I ask is because you sound like you have made up your mind phylosophically, not based on practical experience.
rsanheim 4 hours ago [-]
It's just Pattern matching. Most APIs, and hell, most code is not unique or special. Its all been done a thousands of times before. Thats why an LLM can be helpful on some tool you've written just for yourself and never released anywhere.

As to 'knows the answer', I'm don't even know what that means with these tools. All I know is if it is helpful or not.

danielbln 44 minutes ago [-]
Also, most problems are decomposable into simpler, certainly not novel parts. That intractable unicorn problem I hear so much about is probably composed of very pedestrian sub-problems.
hombre_fatal 3 hours ago [-]
This doesn't seem like a useful nor accurate way of describing LLMs.

When I built my own programming language and used it to build a unique toy reactivity system and then asked the LLM "what can I improve in this file", you're essentially saying it "only" could help me because it learned how it could improve arbitrary code before in other languages and then it generalized those patterns to help me with novel code and my novel reactivity system.

"It just saw that before on Stack Overflow" is a bad trivialization of that.

It saw what on Stack Overflow? Concrete code examples that it generalized into abstract concepts it could apply to novel applications? Because that's the whole damn point.

skydhash 50 minutes ago [-]
Programming languages, by their nature of being formal notation, only have a few patterns to follow, all of them listed in the grammar of that language. And then there’s only so much libraries out there. I believe there’s more unique comments and other code explanations out there than unique code patterns. Take something like MDN where there’s a full page of text for every JavaScript, html, css symbol.
jumploops 3 hours ago [-]
> It is absolutely true, and AI cannot think, reason, comprehend anything it has not seen before.

The amazing thing about LLMs is that we still don’t know how (or why) they work!

Yes, they’re magic mirrors that regurgitate the corpus of human knowledge.

But as it turns out, most human knowledge is already regurgitation (see: the patent system).

Novelty is rare, and LLMs have an incredible ability to pattern match and see issues in “novel” code, because they’ve seen those same patterns elsewhere.

Do they hallucinate? Absolutely.

Does that mean they’re useless? Or does that mean some bespoke code doesn’t provide the most obvious interface?

Having dealt with humans, the confidence problem isn’t unique to LLMs…

skydhash 56 minutes ago [-]
> The amazing thing about LLMs is that we still don’t know how (or why) they work!

You may want to take a course in machine learning and read a few papers.

rainonmoon 25 minutes ago [-]
> the corpus of human knowledge.

Goodness this is a dim view on the breadth of human knowledge.

jamesrcole 14 minutes ago [-]
what do you object to about it? I don't see an issue with referring to "the corpus of human knowledge". "Corpus" pretty much just means the "collection of".
4 hours ago [-]
olmo23 4 hours ago [-]
Where does the knowledge come from? People can only post to SO if they've read the code or the documentation. I don't see why LLMs couldn't do that.
nobunaga 4 hours ago [-]
ITT: People who think LLMs are AGI and can produce output that the LLM has come up with out of thin air or by doing research. Go speak with someone who is actually an expert in this field how LLMs work and why the training data is so important. Im amazed that people in the CS industry seem to talk like they know everything about a tech after using it but never even writing a line of code for an LLM. Our indsutry is doomed with people like this.
usef- 3 hours ago [-]
This isn't about being AGI or not, and it's not "out of thin air".

Modern implementations of LLMs can "do research" by performing searches (whose results are fed into the context), or in many code editors/plugins, the editor will index the project codebase/docs and feed relevant parts into the context.

My guess is they either were using the LLM from a code editor, or one of the many LLMs that do web searches automatically (ie. all of the popular ones).

They are answering non-stackoverflow questions every day, already.

planb 2 hours ago [-]
I think the time has come to not mean LLMs when talking about AI. An agent with web access can do so much more and hallucinates way less than "just" the model. We should start seeing the model as a building block of an AI system.
kypro 4 hours ago [-]
The idea that LLMs can only spew out text they've been trained on is a fundamental miss-understanding of how modern backprop training algorithms work. A lot of work goes into refining training algorithms to preventing overfitting of the training data.

Generalisation is something that neural nets are pretty damn good at, and given the complexity of modern LLMs the idea that they cannot generalise the fairly basic logical rules and patterns found in code such that they're able provide answers to inputs unseen in the training data is quite an extreme position.

BlackFly 4 hours ago [-]
One of the many ways that search got worse over time was the promotion of blog spam over actual documentation. Generally, I would rather have good API documentation or a user guide that leads me through the problem so that next time I know how to help myself. Reading through good API documentation often also educates you about the overall design and associated functionality that you may need to use later. Reading the manual for technology that you will be regularly using is generally quite profitable.

Sometimes, a function doesn't work as advertised or you need to do something tricky, you get a weird error message, etc. For those things, stackoverflow could be great if you could find someone who had a similar problem. But the tutorial level examples on most blogs might solve the immediate problem without actually improving your education.

It would be similar to someone solving your homework problems for you. Sure you finished your homework, but that wasn't really learning. From this perspective, ChatGPT isn't helping you learn.

blueflow 4 hours ago [-]
You parent searches for answers, you search for documentation. Thats why AI works for him and not for you.
ryanackley 2 hours ago [-]
You're completely missing his point. If nobody figures things out for themselves, there's a risk that at some point, AI won't have anything to learn on since people will stop writing blog posts on how they figured something out and answering stack overflow questions.

Sure, there is a chance that one day AI will be smart enough to read an entire codebase and chug out exhaustively comprehensive and accurate documentation. I'm not convinced that is guaranteed to happen before our collective knowledge falls off a cliff.

blueflow 2 hours ago [-]
Read it again, slowly. FSVO "works":

  Thats why AI works for him and not for you.
We both agree.
socalgal2 7 hours ago [-]
To add, another experience I had. I was using an API I'm not that familiar with. My program was crashing. Looking at the stack trace I didn't see why. Maybe if I had many months experience with this API it would be obvious but it certainly wasn't to me. For fun I just copy and pasted the stack trace into Gemini. ~60 frames worth of C++. It immediately pointed out the likely cause given the API I was using. I fixed the bug with a 2 line change once I had that clue from the AI. That seems pretty useful to me. I'm not sure how long it would have taken me to find it otherwise since, as I said, I'm not that familiar with that API.
nottorp 5 hours ago [-]
You remember when Google used to do the same thing for you way before "AI"?

Okay, maybe sometimes the post about the stack trace was in Chinese, but a plain search used to be capable of giving the same answer as a LLM.

It's not that LLMs are better, it's search that got entshittified.

chasd00 31 minutes ago [-]
I don’t think search use to do everything LLMs do now but you have a very good point. Search has gotten much worse. I would say search is about the quality it was just before google launched. My general search needs are being met more and more by Claude, I use google only when I know very specific keywords because of seo spam and ads.
averageRoyalty 3 hours ago [-]
A horse used to get you places just like a car could. A wisk worked as well as a blender.

We have a habit of finding efficiencies in our processes, even if the original process did work.

socalgal2 4 hours ago [-]
I remember when I could paste an error message into Google and get an answer. I do not remember pasting a 60 line stack trace into Google and getting an answer, though I'm pretty sure I honestly never tried that. Did it work?
0x000xca0xfe 46 minutes ago [-]
Yes, pasting lots of seemingly random context into Google used to work shockingly well.

I could break most passwords of an internal company application by googling the SHA1 hashes.

It was possible to reliably identify plants or insects by just googling all the random words or sentences that would come to mind describing it.

(None of that works nowadays, not even remotely)

Philpax 5 hours ago [-]
Google has never identified the logical error in a block of code for me. I could find what an error code was, yes, but it's of very little help when you don't have a keyword to search.
jasode 5 hours ago [-]
>You remember when Google used to do the same thing for you way before "AI"? [...] stack trace [...], but a plain search used to be capable of giving the same answer as a LLM.

The "plain" Google Search before LLM never had the capability to copy&paste an entire lengthy stack trace (e.g. ~60 frames of verbose text) because long strings like that exceeds Google's UI. Various answers say limit of 32 words and 5784 characters: https://www.google.com/search?q=limit+of+google+search+strin...

Before LLM, the human had to manually visually hunt through the entire stack trace to guess at a relevant smaller substring and paste that into Google the search box. Of course, that's do-able but that's a different workflow than an LLM doing it for you.

To clarify, I'm not arguing that the LLM method is "better". I'm just saying it's different.

nottorp 1 hours ago [-]
That's a good point, because now that I think of it, I never pasted a full stack trace in a search engine. I selected what looked to be the relevant part and pasted that.

But I did it subconsciously. I never thought of it until today.

Another skill that LLM use can kill? :)

nsonha 3 hours ago [-]
> when Google used to do the same thing for you way before "AI"?

Which is never? Do you often just lie to win arguments? LLM gives you a synthesized answer, search engine only returns what already exists. By definition it can not give you anything that is not a super obvious match

nottorp 3 hours ago [-]
> Which is never?

In my experience it was "a lot". Because my stack traces were mostly hardware related problems on arm linux in that period.

But I suppose your stack traces were much different and superior and no one can have stack traces that are different from yours. The world is composed of just you and your project.

> Do you often just lie to win arguments?

I do not enjoy being accused of lying by someone stuck in their own bubble.

When you said "Which is never" did you lie consciously or subconsciously btw?

FranzFerdiNaN 5 hours ago [-]
It was just as likely that Google would point you towards a stackoverflow question that was closed because it was considered a duplicate of a completely different question.
turtlebits 6 hours ago [-]
It's perfect for small boilerplate utilities. If I need a browser extension/tampermonkey script, I can get up and running quickly without having to read docs/write manifests. These are small projects where without AI, I wouldn't have bothered to even start.

At its least, AI can be extremely useful for autocompleting simple code logic or automatically finding replacements when I'm copying code/config and making small changes.

PeterStuer 4 hours ago [-]
I love leaning new things. With ai I am learning more and faster.

I used to be on the Microsoft stack for decades. Windows, Hyper-V, .NET, SQL Server ... .

Got tired of MS's licensing BS and I made the switch.

This meant learning Proxmox, Linux, Pangolin, UV, Python, JS, Bootstrap, NGinx, Plausible, SQLite, Postgress ...

Not all of these were completely new, but I had never dove in seriously.

Without AI, this would have been a long and daunting project. AI made this so much smoother. It never tires of my very basic questions.

It does not always answer 100% correct the first time (tip: paste in the docs of specific version of the thing you are trying to figure out as it sometimes has out-of-date or mixed version knowledge), but most often can be nudged and prodded to a very helpfull result.

AI is just an undeniably superior teacher than Google or Stack Overflow ever was. You still do the learning, but the AI is great in getting you to learn.

rootnod3 30 minutes ago [-]
I might be an outlier, but I much prefer reading the documentation myself. One of the reasons I love using FreeBSD and OpenBSD as daily drivers. The documentation is just so damn good. Is it a pain in the ass at the beginning? Maybe. But I require way less documentation lookups over time and do not have to rely on AI for that.

Don't get me wrong, I tried. But even when pasting the documentation in, the amount of times it just hallucinated parameters and arguments that were not even there were such a huge waste of time, I don't see the value in it.

greybox 4 hours ago [-]
I trust chatgpt and gemini a lot less than stackoverflow. On stackoverflow I can see the context that the answer to the original question was given in. AI does not do this. I've asked chatgpt questions about cmake for instance that it got subtly wrong, if I had not noticed this it would have cost me aa lot of time.
thedelanyo 6 hours ago [-]
So AI is basically best as a search engine.
groestl 6 hours ago [-]
That's right.
cess11 5 hours ago [-]
I mean, it's just a compressed database with a weird query engine.
yard2010 4 hours ago [-]
I think the main issue here is trust. When you google something you develop a sense for bullshit so you can "feel" the sources and weigh them accordingly. Using a chat bot, this bias doesn't hold, so you don't know what is just SEO bullshit reiterated in sweet words and what's not.
nikanj 6 hours ago [-]
And ChatGPT never closes your question without answer because it (falsely) thinks it's a duplicate of a different question from 13 years ago
nottorp 5 hours ago [-]
But it does give you a ready to copy paste answer instead of a 'teach the man how to fish' answer.
addandsubtract 5 hours ago [-]
Not if you prompt it to explain the answer it gives you.
nottorp 4 hours ago [-]
Not the same thing. Copying code, even with comprehensive explanations, teaches less than writing/adjusting your own code based on advice.
elbear 10 minutes ago [-]
It can also do that if you ask it. It can give you exercises that you can solve. But you have to specifically ask, because by default it just gives you code.
nottorp 1 minutes ago [-]
Of course, I originally was picking on Stack Overflow's moderation.

Which strongly discouraged trying to teach people.

nikanj 4 hours ago [-]
I'd rather have a copy paste answer than a "go fish" answer
rwmj 3 hours ago [-]
> What I think happens is that these people save time because they only spot review the AI generated code, or skip the review phase altogether, which as I said above would be a deal breaker for me.

In my experience it's that they dump the code into a pull request and expect me to review it. So GenAI is great if someone else is doing the real work.

anelson 2 hours ago [-]
I’ve experienced this as well. If management is not competent they can’t tell (or don’t want to hear) when a “star” performer is actually a very expensive wrapper around a $20/mo cursor subscription.

Unlike the author of the article I do get a ton of value from coding agents, but as with all tools they are less than useless when wielded incompetently. This becomes more damaging in an org that already has perverse incentives which reward performative slop over diligent and thoughtful engineering.

skydhash 60 minutes ago [-]
Git blame can do a lot in those situations. Find the general location of the bug, then assign everyone that has touched it to the ticket.
cardanome 28 seconds ago [-]
Is that really something you are doing in your job?

Most of my teams have been very allergic to assigning personal blame and management very focused on making sure everyone can do everything and we are always replaceable. So maybe I could phrase it like "X could help me with this" but saying X is responsible for the bug would be a no no.

danielbln 45 minutes ago [-]
I don't understand this, the buck stops with the PR submitter. If they get repeated feedback about their PRs that are just passed-through AI slop, then the team lead or whatever should give them a stern talking to.
waprin 11 hours ago [-]
To some degree, traditional coding and AI coding are not the same thing, so it's not surprising that some people are better at one than the other. The author is basically saying that he's much better at coding than AI coding.

But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize. You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.

With that said, I'm still very skeptical of letting the AI drive the majority of the software work, despite meeting people who swear it works. I personally am currently preferring "let the AI do most of the grunt work but get good at managing it and shepherding the high level software design".

It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.

dspillett 5 hours ago [-]
> To some degree, traditional coding and AI coding are not the same thing

LLM-based¹ coding, at least beyond simple auto-complete enhancements (using it directly & interactively as what it is: Glorified Predictive Text) is more akin to managing a junior or outsourcing your work. You give a definition/prompt, some work is done, you refine the prompt and repeat (or fix any issues yourself), much like you would with an external human. The key differences are turnaround time (in favour of LLMs), reliability (in favour of humans, though that is mitigated largely by the quick turnaround), and (though I suspect this is a limit that will go away with time, possibly not much time) lack of usefulness for "bigger picture" work.

This is one of my (several) objections to using it: I want to deal with and understand the minutia of what I am doing, I got into programming, database bothering, and infrastructure kicking, because I enjoyed it, enjoyed learning it, and wanted to do it. For years I've avoided managing people at all, at the known expense of reduced salary potential, for similar reasons: I want to be a tinkerer, not a manager of tinkerers. Perhaps call me back when you have an AGI that I can work alongside.

--------

[1] Yes, I'm a bit of a stick-in-the-mud about calling these things AI. Next decade they won't generally be considered AI like many things previously called AI are not now. I'll call something AI when it is, or very closely approaches, AGI.

danielbln 35 minutes ago [-]
> I want to be a tinkerer, not a manager of tinkerers.

We all want many things, doesn't mean someone will pay you for it. You want to tinker? Great, awesome, more power to you, tinker on personal projects to your heart's content. However, if someone pays you to solve a problem, then it is our job to find the best, most efficient way to cleanly do it. Can LLMs do this on their own most of the time? I think not, not right now at least. The combination of skilled human and LLM? Most likely, yes.

rwmj 3 hours ago [-]
Another difference if your junior will, over time, learn, and you'll also get a sense of whether you can trust them. If after a while they aren't learning and you can't trust them, you get rid of them. GenAI doesn't gain knowledge in the same way, and you're always going to have the same level of trust in it (which in my experience is limited).

Also if my junior argued back and was wrong repeatedly, that's be bad. Lucky that has never happened with AIs ...

averageRoyalty 3 hours ago [-]
Cline, Roocode etc have the concept of rules that can be added to over time. There are heaps of memory bank and orchestration methods for AI.

LLMs absolutely can improve over time.

mitthrowaway2 8 hours ago [-]
The skill ceiling might be "high" but it's not like investing years of practice to become a great pianist. The most experienced AI coder in the world has about three years of practice working this way, much of which is obsoleted because the models have changed to the point where some lessons learned on GPT 3.5 don't transfer. There aren't teachers with decades of experience to learn from, either.
freehorse 4 hours ago [-]
Moreover, the "ceiling" may still be below the "code works" level, and you have no idea when you start if it is or not.
dr_dshiv 6 hours ago [-]
It’s mostly attitude that you are learning. Playfulness, persistence and a willingness to start from scratch again and again.
suddenlybananas 6 hours ago [-]
>persistence and a willingness to start from scratch again and again.

i.e. continually gambling and praying the model spits something out that works instead of thinking.

tsurba 5 hours ago [-]
Gambling is where I end up if I’m tired and try to get an LLM to build my hobby project for me from scratch in one go, not really bothering to read the code properly. It’s stupid and a waste of time. Sometimes it’s easier to get started this way though.

But more seriously, in the ideal case refining a prompt based on a misunderstanding of an LLM due to ambiguity in your task description is actually doing the meaningful part of the work in software development. It is exactly about defining the edge cases, and converting into language what is it that you need for a task. Iterating on that is not gambling.

But of course if you are not doing that, but just trying to get a ”smarter” LLM with (hopefully deprecated study of) ”prompt engineering” tricks, then that is about building yourself a skill that can become useless tomorrow.

chii 3 hours ago [-]
why is the process important? If they can continuously trial and error their way into a good output/result, then it's a fine outcome.
suddenlybananas 2 hours ago [-]
Why is thinking important? Think about it a bit.
chii 2 hours ago [-]
is it more important for a chess engine to be able to think? Or is it able to win by brute force through searching a sufficient outcome?

If the outcome is indistinguisable from using "thinking" as the process rather than brute force, why would the process matter regarding how the outcome was achieved?

suddenlybananas 39 minutes ago [-]
maybe if programming were a well-defined game like chess, but it's not.
chii 19 minutes ago [-]
the grammar of a programming language is just as well defined. And the defined-ness of the "game" isn't required for my argument.

Your concept of thinking is the classic retoric - as soon as some "ai" manages to achieve something which previously wasn't capable, it's no longer AI and is just xyz process. It happened with chess engines, with alphago, and with LLMs. The implication being that human "thinking" is somehow unique and only the AI that replicate it can be considered to have "thinking".

HPsquared 5 hours ago [-]
Most things in life are like that.
notnullorvoid 8 hours ago [-]
Is it a skill worth learning though? How much does the output quality improve? How transferable is it across models and tools of today, and of the future?

From what I see of AI programming tools today, I highly doubt the skills developed are going to transfer to tools we'll see even a year from now.

vidarh 5 hours ago [-]
Given I see people insisting these tools don't work for them at all, and some of my results recently include spitting out a 1k line API client with about 5 brief paragraphs of prompts, and designing a website (the lot, including CSS, HTML, copy, database access) and populating the directory on it with entries, I'd think the output quality improves a very great deal.

From what I see of the tools, I think the skills developed largely consists of skills you need to develop as you get more senior anyway, namely writing detail-oriented specs and understanding how to chunk tasks. Those skills aren't going to stop having value.

npilk 7 minutes ago [-]
Maybe this is yet another application of the bitter lesson. It's not worth learning complex processes for partnering with AI models, because any productivity gains will pale in comparison to the performance improvement from future generations.
serpix 8 hours ago [-]
Regarding using AI tools for programming it is not a one-for-all choice. You can pick a grunt work task such as "Tag every such and such terraform resource with a uuid" and let it do just that. Nothing to do with quality but everything to do with a simple task and not having to bother with the tedium.
autobodie 7 hours ago [-]
Why use AI to do something so simple? You're only increasing the possibility that it gets done wrong. Multi-cursor editing wil be faster anyway.
barsonme 7 hours ago [-]
Why not? I regularly have a couple Claude instances running in the background chewing through simple yet time consuming tasks. It’s saved me many hours of work and given me more time to focus on the important parts.
dotancohen 6 hours ago [-]

  > a couple Claude instances running in the background chewing through simple yet time consuming tasks.
If you don't mind, I'd love to hear more about this. How exactly are they running the background? What are they doing? How do you interact with them? Do they have access to your file system?

Thank you!

Philpax 5 hours ago [-]
I would guess that they're running multiple instances of Claude Code [0] in the background. You can give it arbitrary tasks up to a complexity ceiling that you have to figure out for yourself. It's a CLI agent, so you can just give it directives in the relevant terminal. Yes, they have access to the filesystem, but only what you give them.

[0]: https://www.anthropic.com/claude-code

dotancohen 3 hours ago [-]
Those tasks can take hours, or at least long enough where multiple tasks are running in the background? The page says $17 per month. That's unlimited usage?

If so, it does seem that AI just replaced me at my job... don't let them know. A significant portion of my projects are writing small business tools.

Philpax 1 hours ago [-]
> Those tasks can take hours, or at least long enough where multiple tasks are running in the background?

Maybe not hours, but extended periods of time, yes. Agents are very quick, so they can frequently complete tasks that would have taken me hours in minutes.

> The page says $17 per month. That's unlimited usage?

Each plan has a limited quota; the Pro plan offers you enough to get in and try out Claude Code, but not enough for serious use. The $100 and $200 plans still have quotas, but they're quite generous; people have been able to get orders of magnitude of API-cost-equivalents out of them [0].

> If so, it does seem that AI just replaced me at my job... don't let them know. A significant portion of my projects are writing small business tools.

Perhaps, but for now, you still need to have some degree of vague competence to know what to look out for and what works best. Might I suggest using the tools to get work done faster so that you can relax for the rest of the day? ;)

[0]: https://xcancel.com/HaasOnSaaS/status/1932713637371916341

stitched2gethr 7 hours ago [-]
It will very soon be the only way.
skydhash 11 hours ago [-]
> But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize

No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.

> It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.

While they share a lot of principles (around composition, poses,...), they are different activities with different output. No one conflates the two. You don't draw and think you're going to capture a moment in time. The intent is to share an observation with the world.

furyofantares 8 hours ago [-]
> No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.

The skill floor is something you can pick up in a few minutes and find it useful, yes. I have been spending dedicated effort toward finding the skill ceiling and haven't found it.

I've picked up lots of skills in my career, some of which were easy, but some of which required dedicated learning, or practice, or experimentation. LLM-assisted coding is probably in the top 3 in terms of effort I've put into learning it.

I'm trying to learn the right patterns to use to keep the LLM on track and keeping the codebase in check. Most importantly, and quite relevant to OP, I'd like to use LLMs to get work done much faster while still becoming an expert in the system that is produced.

Finding the line has been really tough. You can get a LOT done fast without this requirement, but personally I don't want to work anywhere that has a bunch of systems that nobody's an expert in. On the flip side, as in the OP, you can have this requirement and end up slower by using an LLM than by writing the code yourself.

viraptor 9 hours ago [-]
> It's something you can pick in a few minutes

You can start in a few minutes, sure. (Also you can start using gdb in minutes) But GP is talking about the ceiling. Do you know which models work better for what kind of task? Do you know what format is better for extra files? Do you know when it's beneficial to restart / compress context? Are you using single prompts or multi stage planning trees? How are you managing project-specific expectations? What type of testing gives better results in guiding the model? What kind of issues are more common for which languages?

Correct prompting these days what makes a difference in tasks like SWE-verified.

sothatsit 9 hours ago [-]
I feel like there is also a very high ceiling to how much scaffolding you can produce for the agents to get them to work better. This includes custom prompts, custom CLAUDE.md files, other documentation files for Claude to read, and especially how well and quickly your linting and tests can run, and how much functionality they cover. That's not to mention MCP and getting Claude to talk to your database or open your website using Playwright, which I have not even tried yet.

For example, I have a custom planning prompt that I will give a paragraph or two of information to, and then it will produce a specification document from that by searching the web and reading the code and documentation. And then I will review that specification document before passing it back to Claude Code to implement the change.

This works because it is a lot easier to review a specification document than it is to review the final code changes. So, if I understand it and guide it towards how I would want the feature to be implemented at the specification stage, that sets me up to have a much easier time reviewing the final result as well. Because it will more closely match my own mental model of the codebase and how things should be implemented.

And it feels like that is barely scratching the surface of setting up the coding environment for Claude Code to work in.

freehorse 5 hours ago [-]
And where all this skill will go when newer models after one year use different tools and require different scaffolding?

The problem with overinvesting in a brand new, developping field is that you get skills that are soon to be redundant. You can hope that the skills are gonna transfer to what will be needed after, but I am not sure if that will be the case here. There was a lot of talk about prompting techniques ("prompt engineering") last year, and now most of these are redundant and I really don't think I have learnt something that is useful enough for the new models, nor have I actually understood sth. These are all tricks and tips level, shallow stuff.

I think these skills are just like learning how to use some tools in an ide. They increase productivity, it's great but if you have to switch ide they may not actually help you with the new things you have to learn in the new environment. Moreover, these are just skills in how to use some tools; they allow you to do things, but we cannot compare learning how to use tools vs actually learning and understanding the structure of a program. The former is obviously a shallow form of knowledge/skill, easily replaceable, easily redundant and probably not transferable (in the current context). I would rather invest more time in the latter and actually get somewhere.

sothatsit 2 hours ago [-]
A lot of the changes to get agents to work well is just good practice anyway. That's what is nice about getting these agents to work well - often, it just involves improving your dev tooling and documentation, which can help real human developers as well. I don't think this is going to become irrelevant any time soon.

The things that will change may be prompts or MCP setups or more specific optimisations like subagents. Those may require more consideration of how much you want to invest in setting them up. But the majority of setup you do for Claude Code is not only useful to Claude Code. It is useful to human developers and other agent systems as well.

> There was a lot of talk about prompting techniques ("prompt engineering") last year and now most of these are redundant.

Not true, prompting techniques still matter a lot to a lot of applications. It's just less flashy now. In fact, prompting techniques matter a ton for optimising Claude Code and creating commands like the planning prompt I created. It matters a lot when you are trying to optimise for costs and use cheaper models.

> I think these skills are just like learning how to use some tools in an ide. > if you have to switch ide they may not actually help you

A lot of the skills you learn in one IDE do transfer to new IDEs. I started using Eclipse and that was a steep learning curve. But later I switched to IntelliJ IDEA and all I had to re-learn were key-bindings and some other minor differences. The core functionality is the same.

Similarly, a lot of these "agent frameworks" like Claude Code are very similar in functionality, and switching between them as the landscape shifts is probably not as large of a cost as you think it is. Often it is just a matter of changing a model parameter or changing the command that you pass your prompt to.

Of course it is a tradeoff, and that tradeoff probably changes a lot depending upon what type of work you do, your level of experience, how old your codebases are, how big your codebases are, the size of your team, etc... it's not a slam dunk that it is definitely worthwhile, but it is at least interesting.

viraptor 9 hours ago [-]
> then it will produce a specification document from that

I like a similar workflow where I iterate on the spec, then convert that into a plan, then feed that step by step to the agent, forcing full feature testing after each one.

bcrosby95 8 hours ago [-]
When you say specification, what, specifically, does that mean? Do you have an example?

I've actually been playing around with languages that separate implementation from specification under the theory that it will be better for this sort of stuff, but that leaves an extremely limited number of options (C, C++, Ada... not sure what else).

I've been using C and the various LLMs I've tried seem to have issues with the lack of memory safety there.

sothatsit 8 hours ago [-]
A "specification" as in a text document outlining all the changes to make.

For example, it might include: Overview, Database Design (Migration, Schema Updates), Backend Implementation (Model Updates, API updates), Frontend Implementation (Page Updates, Component Design), Implementation Order, Testing Considerations, Security Considerations, Performance Considerations.

It sounds like a lot when I type it out, but it is pretty quick to read through and edit.

The specification document is generated by a planning prompt that tells Claude to analyse the feature description (the couple paragraphs I wrote), research the repository context, research best practices, present a plan, gather specific requirements, perform quality control, and finally generate the planning document.

I'm not sure if this is the best process, but it seems to work pretty well.

viraptor 8 hours ago [-]
Like a spec you'd hand to a contractor. List of requirements, some business context, etc. Not a formal algorithm spec.

My basic initial prompt for that is: "we're creating a markdown specification for (...). I'll start with basic description and at each step you should refine the spec to include the new information and note what information is missing or could use refinement."

oxidant 10 hours ago [-]
I do not agree it is something you can pick up in an hour. You have to learn what AI is good at, how different models code, how to prompt to get the results you want.

If anything, prompting well is akin to learning a new programming language. What words do you use to explain what you want to achieve? How do you reference files/sections so you don't waste context on meaningless things?

I've been using AI tools to code for the past year and a half (Github Copilot, Cursor, Claude Code, OpenAI APIs) and they all need slightly different things to be successful and they're all better at different things.

AI isn't a panacea, but it can be the right tool for the job.

15123123 9 hours ago [-]
I am also interested in how much of these skills are at the mercy of OpenAI ? Like IIRC 1 or 2 years ago there was an uproar of AI "artists" saying that their art is ruined because of model changes ( or maybe the system prompt changed ).

>I do not agree it is something you can pick up in an hour.

But it's also interesting that the industry is selling the opposite ( with AI anyone can code / write / draw / make music ).

>You have to learn what AI is good at.

More often than not I find it you need to learn what the AI is bad at, and this is not a fun experience.

oxidant 8 hours ago [-]
Of course that's what the industry is selling because they want to make money. Yes, it's easy to create a proof of concept but once you get out of greenfield into 50-100k tokens needed in the context (reading multiple 500 line files, thinking, etc) the quality drops and you need to know how to focus the models to maintain the quality.

"Write me a server in Go" only gets you so far. What is the auth strategy, what endpoints do you need, do you need to integrate with a library or API, are there any security issues, how easy is the code to extend, how do you get it to follow existing patterns?

I find I need to think AND write more than I would if I was doing it myself because the feedback loop is longer. Like the article says, you have to review the code instead of having implicit knowledge of what was written.

That being said, it is faster for some tasks, like writing tests (if you have good examples) and doing basic scaffolding. It needs quite a bit of hand holding which is why I believe those with more experience get more value from AI code because they have a better bullshit meter.

skydhash 2 hours ago [-]
> What is the auth strategy, what endpoints do you need, do you need to integrate with a library or API, are there any security issues, how easy is the code to extend, how do you get it to follow existing patterns?

That is software engineering realm, not using LLMs realm. You have to answer all of these questions even with traditional coding. Because they’re not coding questions, they’re software design questions. And before that, there were software analysis questions preceded by requirements gathering questions.

A lot of replies around the thread is conflating coding activities with the parent set of software engineering activities.

solumunus 8 hours ago [-]
OpenAI? They are far from the forefront here. No one is using their models for this.
15123123 7 hours ago [-]
You can substitute for whatever saas company of your choice.
sagarpatil 9 hours ago [-]
Yeah, you can’t do sh*t in an hour. I spend a good 6-8 hours every day using Claude Code, and I actually spend an hour every day trying new AI tools, it’s a constant process.

Here’s what my today’s task looks like: 1. Test TRAE/Refact.ai/Zencoder: 70% on SWE verified 2. https://github.com/kbwo/ccmanager: use git tree to manage multiple Claude Code sessions 3. https://github.com/julep-ai/julep/blob/dev/AGENTS.md: Read and implement 4. https://github.com/snagasuri/deebo-prototype: Autonomous debugging agent (MCP) 5. https://github.com/claude-did-this/claude-hub: connects Claude Code to GitHub repositories.

__MatrixMan__ 10 hours ago [-]
It definitely takes more than minutes to discover the ways that your model is going to repeatedly piss you off and set up guardrails to mitigate those problems.
JimDabell 8 hours ago [-]
> It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up).

This doesn’t give you any time to experiment with alternative approaches. It’s equivalent to saying that the first approach you try as a beginner will be as good as it possibly gets, that there’s nothing at all to learn.

dingnuts 11 hours ago [-]
> You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.

ok but how much am I supposed to spend before I supposedly just "get good"? Because based on the free trials and the pocket change I've spent, I don't consider the ROI worth it.

qinsig 10 hours ago [-]
Avoid using agents that can just blow through money (cline, roocode, claudecode with API key, etc).

Instead you can get comfortable prompting and managing context with aider.

Or you can use claude code with a pro subscription for a fair amount of usage.

I agree that seeing the tools just waste several dollars to just make a mess you need to discard is frustrating.

goalieca 11 hours ago [-]
And how often do your prompting skills change as the models evolve.
badsectoracula 8 hours ago [-]
It wont be the hippest of solutions, but you can use something like Devstral Small with a full open source setup to get experimenting with local LLMs and a bunch of tools - or just chat with it with a chat interface. I did pingponged between Devstral running as a chat interface and my regular text editor some time ago to make a toy project of a raytracer [0] (output) [1] (code).

While it wasn't the fanciest integration (nor the best of codegen), it was good enough to "get going" (the loop was to ask the LLM do something, then me do something else in the background, then fix and merge the changed it did - even though i often had to fix stuff[2], sometimes it was less of a hassle than if i had to start from scratch[3]).

It can give you a vague idea that with more dedicated tooling (i.e. something that does automatically what you'd do by hand[4]) you could do more interesting things (combining with some sort of LSP functionality to pass function bodies to the LLM would also help), though personally i'm not a fan of the "dedicated editor" that seems to be used and i think something more LSP-like (especially if it can also work with existing LSPs) would be neat.

IMO it can be useful for a bunch of boilerplate-y or boring work. The biggest issue i can see is that the context is too small to include everything (imagine, e.g., throwing the entire Blender source code in an LLM which i don't think even the largest of cloud-hosted LLMs can handle) so there needs to be some external way to store stuff dynamically but also the LLM to know that external stuff are available, look them up and store stuff if needed. Not sure how exactly that'd work though to the extent where you could -say- open up a random Blender source code file, point to a function, ask the LLM to make a modification, have it reuse any existing functions in the codebase where appropriate (without you pointing them out) and then, if needed, have the LLM also update the code where the function you modified is used (e.g. if you added/removed some argument or changed the semantics of its use).

[0] https://i.imgur.com/FevOm0o.png

[1] https://app.filen.io/#/d/e05ae468-6741-453c-a18d-e83dcc3de92...

[2] e.g. when i asked it to implement a BVH to speed up things it made something that wasn't hierarchical and actually slowed down things

[3] the code it produced for [2] was fixable to do a simple BVH

[4] i tried a larger project and wrote a script that `cat`ed and `xclip`ed a bunch of header files to pass to the LLM so it knows the available functions and each function had a single line comment about what it does - when the LLM wrote new functions it also added that comment. 99% of these oneliner comments were written by the LLM actually.

grogenaut 10 hours ago [-]
how much time did you spend learning your last language to become comfortable with it?
stray 10 hours ago [-]
You're going to spend a little over $1k to ramp up your skills with AI-aided coding. It's dirt cheap in the grand scheme of things.
viraptor 9 hours ago [-]
Not even close. I'm still under $100, creating full apps. Stick to reasonable models and you can achieve and learn a lot. You don't need latest and greatest in max mode (or whatever the new one calls it) for majority of the tasks. You can have to throw the whole project at the service every time either.
viraptor 2 hours ago [-]
Typo: ...you don't have to throw the whole project context...
dingnuts 10 hours ago [-]
do I get a refund if I spend a grand and I'm still not convinced? at some point I'm going to start lying to myself to justify the cost and I don't know how much y'all earn but $1k is getting close
theoreticalmal 9 hours ago [-]
Would you ask for a refund from a university class if you didn’t get a job or skill from it? Investing in a potential skill is a risk and carries an opportunity cost, that’s part of what makes it a risk
HDThoreaun 9 hours ago [-]
No one is forcing you to improve. If you don’t want to invest in yourself that is fine, you’ll just be left behind.
asciimov 9 hours ago [-]
How are those without that kind of scratch supposed to keep up with those that do?
theoreticalmal 9 hours ago [-]
This kind of seems like asking “how are poor people supposed to keep up with rich people” which we seem to not have a long term viable answer for right now
wiseowise 8 hours ago [-]
What makes you think those without that kind of scratch are supposed to keep up?
asciimov 8 hours ago [-]
For the past 10 years we have been telling everyone learn to code, now it’s learn to build AI prompts.

Before a poor kid with a computer access could learn to code nearly for free, but if it costs $1k just to get started with AI that poor kid will never have that opportunity.

wiseowise 7 hours ago [-]
For the past 10 years scammers and profiteers been telling everyone to learn to code, not we.
sagarpatil 9 hours ago [-]
Use free tiers?
throwawaysleep 9 hours ago [-]
If you lack "that kind of scratch", you are at the learning stage for software development, not the keeping up stage. Either that or horribly underpaid.
bevr1337 8 hours ago [-]
I recently had a coworker tell me he liked his last workplace because "we all spoke the same language." It was incredible how much he revealed about himself with what he thought was a simple fact about engineer culture. Your comment reminds me of that exchange.

- Employers, not employees, should provide workplace equipment or compensation for equipment. Don't buy bits for the shop, nails for the foreman, or Cursor for the tech lead.

- the workplace is not a meritocracy. People are not defined by their wealth.

- If $1,000 does not represent an appreciable amount of someone's assets, they are doing well in life. Approximately half of US citizens cannot afford rent if they lose a paycheck.

- Sometimes the money needs to go somewhere else. Got kids? Sick and in the hospital? Loan sharks? A pool full of sharks and they need a lot of food?

- Folks can have different priorities and it's as simple as that

We're (my employer) still unsure if new dev tooling is improving productivity. If we find out it was unhelpful, I'll be very glad I didn't lose my own money.

15123123 9 hours ago [-]
$100 per month for a SaaS is quite a lot outside of Western countries. People are not even spending that much on VPN or Password Manager.
8 hours ago [-]
lexandstuff 10 hours ago [-]
Great article. The other thing that you miss out on when you don't write the code yourself is that sense of your subconscious working for you. Writing code has a side benefit of developing a really strong mental model of a problem, that kinda gets embedded in your neurons and pays dividends down the track, when doing stuff like troubleshooting or deciding on how to integrate a new feature. You even find yourself solving problems in your sleep.

I haven't observed any software developers operating at even a slight multiplier from the pre-LLM days at the organisations I've worked at. I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.

nerevarthelame 8 hours ago [-]
> I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.

I think that's a really elegant way to put it. Google Research tried to measure LLM impacts on productivity in 2024 [1]. They gave their subjects an exam and assigned them different resources (a book versus an LLM). They found that the LLM users actually took more time to finish than those who used a book, and that only novices on the subject material actually improved their scores when using an LLM.

But the participants also perceived that they were more accurate and efficient using the LLM, when that was not the case. The researchers suggested that it was due to "reduced cognitive load" - asking an LLM something is easy and mostly passive. Searching through a book is active and can feel more tiresome. Like you said: people are getting addicted to not having to expend brain energy to solve problems, and mistaking that for productivity.

[1] https://storage.googleapis.com/gweb-research2023-media/pubto...

wiseowise 8 hours ago [-]
You’re twisting results. Just because they took more time doesn’t mean their productivity went down. On the contrary, if you can perform expert task with much less mental resources (which 99% of orgs should prioritize for) then it is an absolute win. Work is extremely mentally draining and soul crushing experience for majority of people, if AI can lower that while maintaining roughly same result with subjects allocating only, say, 25% of their mental energy – that’s an amazing win.
didibus 8 hours ago [-]
If I follow what you are saying, employers won't see any benefits, but employees, while they will take the same time and create the same output in the same amount of time, will be able to do so at a reduced mental strain?

Personally, I don't know if this is always a win, mostly because I enjoy the creative and problem solving aspect of coding, and reducing that to something that is more about prompting, correcting, and mentoring an AI agent doesn't bring me the same satisfaction and joy.

Vinnl 3 hours ago [-]
Steelmanning their argument, employers will see benefits because while the employee might be more productive than with an LLM in the first two hours of the day, the cognitive load reduces their productivity as the day goes on. If employees are able to function at a higher level for longer during their day with an LLM, that should benefit the employer.
tsurba 5 hours ago [-]
And how long have you been doing this? Because that sounds naive.

After doing programming for a decade or two, the actual act of programming is not enough to be ”creative problem solving”, it’s the domain and set of problems you get to apply it to that need to be interesting.

>90% of programming tasks at a company are usually reimplementing things and algorithms that have been done a thousand times before by others, and you’ve done something similar a dozen times. Nothing interesting there. That is exactly what should and can now be automated (to some extent).

In fact solving problems creatively to keep yourself interested, when the problem itself is boring is how you get code that sucks to maintain for the next guy. You should usually be doing the most clear and boring implementation possible. Which is not what ”I love coding” -people usually do (I’m definitely guilty).

To be honest this is why I went back to get a PhD, ”just coding” stuff got boring after a few years of doing it for a living. Now it feels like I’m just doing hobby projects again, because I work exactly on what I think could be interesting for others.

giantg2 8 minutes ago [-]
The true test is can it write tests? Ask the dev if they use it to write tests. The answers to #1 is it can't really. The answer to #2 should be no.

AI can write some tests, but it can't design thorough ones. Perhaps the best way to use AI is to have a human writing thorough and well documented tests as part of TDD, asking AI to write code to meet those tests, then thoroughly reviewing that code.

AI saves me just a little time by writing boilerplate stuff for me, just one step above how IDEs have been providing generated getters and setters.

jumploops 11 hours ago [-]
> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.

As someone who uses Claude Code heavily, this is spot on.

LLMs are great, but I find the more I cede control to them, the longer it takes to actually ship the code.

I’ve found that the main benefit for me so far is the reduction of RSI symptoms, whereas the actual time savings are mostly over exaggerated (even if it feels faster in the moment).

adriand 11 hours ago [-]
Do you have to review the code? I’ll be honest that, like the OP theorizes, I often just spot review it. But I also get it to write specs (often very good, in terms of the ones I’ve dug into), and I always carefully review and test the results. Because there is also plenty of non-AI code in my projects I didn’t review at all, namely, the myriad open source libraries I’ve installed.
jumploops 10 hours ago [-]
Yes, I’m actually working on an another project with the goal of never looking at the code.

For context, it’s just a reimplementation of a tool I built.

Let’s just say it’s going a lot slower than the first time I built it by hand :)

hatefulmoron 10 hours ago [-]
It depends on what you're doing. If it's a simple task, or you're making something that won't grow into something larger, eyeballing the code and testing it is usually perfect. These types of tasks feel great with Claude Code.

If you're trying to build something larger, it's not good enough. Even with careful planning and spec building, Claude Code will still paint you into a corner when it comes to architecture. In my experience, it requires a lot of guidance to write code that can be built upon later.

The difference between the AI code and the open source libraries in this case is that you don't expect to be responsible for the third-party code later. Whether you or Claude ends up working on your code later, you'll need it to be in good shape. So, it's important to give Claude good guidance to build something that can be worked on later.

vidarh 5 hours ago [-]
If you let it paint you into a corner, why are you doing so?

I don't know what you mean by "a lot of guidance". Maybe I just naturally do that, but to me there's not been much change in the level of guidance I need to give Claude Code or my own agent vs. what I'd give developers working for me.

Another issue is that as long as you ensure it builds good enough tests, the cost of telling it to just throw out the code it builds later and redo it with additional architectural guidance keeps dropping.

The code is increasingly becoming throwaway.

hatefulmoron 4 hours ago [-]
> If you let it paint you into a corner, why are you doing so?

What do you mean? If it were as simple as not letting it do so, I would do as you suggest. I may as well stop letting it be incorrect in general. Lots of guidance helps avoid it.

> Maybe I just naturally do that, but to me there's not been much change in the level of guidance I need to give Claude Code or my own agent vs. what I'd give developers working for me.

Well yeah. You need to give it lots of guidance, like someone who works for you.

> the cost of telling it to just throw out the code it builds later and redo it with additional architectural guidance keeps dropping.

It's a moving target for sure. My confidence with this in more complex scenarios is much smaller.

vidarh 4 hours ago [-]
> What do you mean? If it were as simple as not letting it do so, I would do as you suggest.

I'm arguing it is as simple as that. Don't accept changes that muddle up the architecture. Take attempts to do so as evidence that you need to add direction. Same as you presumably would - at least I would - with a developer.

hatefulmoron 3 hours ago [-]
My concern isn't that it's messing up my architecture as I scream in protest from the other room, powerless to stop it. I agree with you and I think I'm being quite clear. Without relatively close guidance, it will paint you into a corner in terms of architecture. Guide it, direct it, whatever you want to call it.
cbsmith 11 hours ago [-]
There's an implied assumption here that code you write yourself doesn't need to be reviewed from a context different from the author's.

There's an old expression: "code as if your work will be read by a psychopath who knows where you live" followed by the joke "they know where you live because it is future you".

Generative AI coding just forces the mindset you should have had all along: start with acceptance criteria, figure out how you're going to rigorously validate correctness (ideally through regression tests more than code reviews), and use the review process to come up with consistent practices (which you then document so that the LLM can refer to it).

It's definitely not always faster, but waking up in the morning to a well documented PR, that's already been reviewed by multiple LLMs, with successfully passing test runs attached to it sure seems like I'm spending more of my time focused on what I should have been focused on all along.

Terr_ 9 hours ago [-]
There's an implied assumption here that developers who end up spending all their time reviewing LLM code won't lose their skills or become homicidal. :p
cbsmith 8 hours ago [-]
Fair enough. ;-)

I'm actually curious about the "lose their skills" angle though. In the open source community it's well understood that if anything reviewing a lot of code tends to sharpen your skills.

Terr_ 8 hours ago [-]
I expect that comes from the contrast and synthesis between how the author is anticipating things will develop or be explained, versus what the other person actually provided and trying to understand their thought process.

What happens if the reader no longer has enough of that authorial instinct, their own (opinionated) independent understanding?

I think the average experience would drift away from "I thought X was the obvious way but now I see by doing Y you were avoid that other problem, cool" and towards "I don't see the LLM doing anything too unusual compared to when I ask it for things, LGTM."

cbsmith 7 hours ago [-]
It seems counter intuitive that the reader would no longer have that authorial instinct due to lack of writing. Like, maybe they never had it, in which case, yes. But being exposed to a lot of different "writing opinions" tends to hone your own.

Let's say you're right though, and you lose that authorial instinct. If you've got five different proposals/PRs from five different models, each one critiqued by the other four, the needs for authorial instinct diminish significantly.

layer8 4 hours ago [-]
I don’t find this convincing. People generally don’t learn how to write a good novel just by reading a lot of them.
sagarpatil 9 hours ago [-]
I always use Claude Code to debug issues, there’s no point in trying to do this yourself when AI can fix it in minutes (easy to verify if you write tests first) o3 with new search can do things in 5 mins that will take me at least 30 mins if I’m very efficient. Say what you want but the time savings is real.
layer8 4 hours ago [-]
Tests can never verify the correctness of code, they only spot-check for incorrectness.
susshshshah 9 hours ago [-]
How do you know what tests to write if you don’t understand the code?
9rx 8 hours ago [-]
Same way you normally would? Tests are concerned with behaviour. The code that implements the behaviour is immaterial.
wiseowise 8 hours ago [-]
How do you do TDD without having code in the first place? How do QA verifies without reading the source?
adastra22 7 hours ago [-]
I’m not sure I understand this statement. You give your program parameters X and expect result Y, but instead get Z. There is your test, embedded in the problem statement.
8 hours ago [-]
mleonhard 11 hours ago [-]
I solved my RSI symptoms by keeping my arms warm all the time, while awake or asleep. Maybe that will work for you, too?
jumploops 10 hours ago [-]
My issue is actually due to ulnar nerve compression related to a plate on my right clavicle.

Years of PT have enabled me to work quite effectively and minimize the flare ups :)

10 hours ago [-]
hooverd 11 hours ago [-]
Is anybody doing cool hybrid interfaces? I don't actually want to do everything in conversational English, believe it or not.
jumploops 11 hours ago [-]
My workflow is to have spec files (markdown) for any changes I’m making, and then use those to keep Claude on track/pull out of the trees.

Not super necessary for small changes, but basically a must have for any larger refactors or feature additions.

I usually use o3 for generating the specs; also helpful for avoiding context pollution with just Claude Code.

adastra22 7 hours ago [-]
I do similar and find that this is the best compromise that I have tried. But I still find myself nodding along with OP. I am more and more finding that this is not actually faster, even though it certainly seems so.
bdamm 11 hours ago [-]
Isn't that what Windsurf or Cursor are?
marssaxman 9 hours ago [-]
So far as I can tell, generative AI coding tools make the easy part of the job go faster, without helping with the hard part of the job - in fact, possibly making it harder. Coding just doesn't take that much time, and I don't need help doing it. You could make my coding output 100x faster without materially changing my overall productivity, so I simply don't bother to optimize there.
resource_waste 1 hours ago [-]
I have it write algorithms, explain why my code isnt working, write API calls, or make specific functions.

The entire code? Not there, but with debuggers, I've even started doing that a bit.

Jonovono 9 hours ago [-]
Are you a plumber perhaps?
kevinventullo 6 hours ago [-]
I’m not sure I follow the question. I think of plumbing as being the exact kind of verbose boilerplate that LLM’s are quite good at automating.

In contrast, when I’m trying to do something truly novel, I might spend days with a pen and paper working out exactly what I want to do and maybe under an hour coding up the core logic.

On the latter type of work, I find LLM’s to be high variance with mostly negative ROI. I could probably improve the ROI by developing a better sense of what they are and aren’t good at, but of course that itself is rapidly changing!

worik 8 hours ago [-]
I am.

That is the mental model I have for the work (computer programing) i like to do and am good at.

Plumbing

tptacek 8 hours ago [-]
I'm fine with anybody saying AI agents don't work for their work-style and am not looking to rebut this piece, but I'm going to take this opportunity to call something out.

The author writes "reviewing code is actually harder than most people think. It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself". That sounds within an SD of true for me, too, and I had a full-time job close-reading code (for security vulnerabilities) for many years.

But it's important to know that when you're dealing with AI-generated code for simple, tedious, or rote tasks --- what they're currently best at --- you're not on the hook for reading the code that carefully, or at least, not on the same hook. Hold on before you jump on me.

Modern Linux kernels allow almost-arbitrary code to be injected at runtime, via eBPF (which is just a C program compiled to an imaginary virtual RISC). The kernel can mostly reliably keep these programs from crashing the kernel. The reason for that isn't that we've solved the halting problem; it's that eBPF doesn't allow most programs at all --- for instance, it must be easily statically determined that any backwards branch in the program runs for a finite and small number of iterations. eBPF isn't even good at determining that condition holds; it just knows a bunch of patterns in the CFG that it's sure about and rejects anything that doesn't fit.

That's how you should be reviewing agent-generated code, at least at first; not like a human security auditor, but like the eBPF verifier. If I so much as need to blink when reviewing agent output, I just kill the PR.

If you want to tell me that every kind of code you've ever had to review is equally tricky to review, I'll stipulate to that. But that's not true for me. It is in fact very easy to me to look at a rote recitation of an idiomatic Go function and say "yep, that's what that's supposed to be".

sensanaty 4 hours ago [-]
But how is this a more efficient way of working? What if you have to have it open 30 PRs before 1 of them is acceptable enough to not outright ignore? It sounds absolutely miserable, I'd rather review my human colleague's work because in 95% of cases I can trust that it's not garbage.

The alternative where I boil a few small lakes + a few bucks in return for a PR that maybe sometimes hopefully kinda solves the ticket sounds miserable. I simply do not want to work like that, and it doesn't sound even close to efficient or speedier or anything like that, we're just creating extra work and extra waste for literally no reason other than vague marketing promises about efficiency.

kasey_junk 2 hours ago [-]
If you get to 2 or 3 and it hasn’t done what you want you fall back to writing it yourself.

But in my experience this is _signal_. If the ai cant get to it with minor back and forth then something needs work, your understanding, the specification, the tests, your code factoring etc.

The best case scenario is your agent one shots the problem. But close behind that is that your agent finds a place where a little cleanup makes everybody’s life easier you, your colleagues and the bot. And your company is now incentivized to invest in that.

The worse case is you took the time to write 2 prompts that didn’t work.

smaudet 8 hours ago [-]
I guess my challenge is that "if it was a rote recitation of an idiomatic go function", was it worth writing?

There is a certain, style, lets say, of programming, that encourages highly non re-usable code that is both at once boring and tedious, and impossible to maintain and thus not especially worthwhile.

The "rote code" could probably have been expressed, succinctly, in terms that border on "plain text", but with more rigueur de jour, with less overpriced, wasteful, potentially dangerous models in-between.

And yes, machines like the eBPF verifier must follow strict rules to cut out the chaff, of which there is quite a lot, but it neither follows that we should write everything in eBPF, nor does it follow that because something can throw out the proverbial "garbage", that makes it a good model to follow...

Put another way, if it was that rote, you likely didn't need nor benefit from the AI to begin with, a couple well tested library calls probably sufficed.

sesm 4 hours ago [-]
I would put it differently: when you already have a mental model of what the code is supposed to do and how, then reviewing is easy: just check that the code conforms to that model.

With an arbitrary PR from a colleague or security audit, you have to come up with mental model first, which is the hardest part.

tptacek 8 hours ago [-]
Yes. More things should be rote recitations. Rote code is easy to follow and maintain. We get in trouble trying to be clever (or DRY) --- especially when we do it too early.

Important tangential note: the eBPF verifier doesn't "cut out the chaff". It rejects good, valid programs. It does not care that the programs are valid or good; it cares that it is not smart enough to understand them; that's all that matters. That's the point I'm making about reviewing LLM code: you are not on the hook for making it work. If it looks even faintly off, you can't hurt the LLM's feelings by killing it.

smaudet 8 hours ago [-]
> We get in trouble trying to be clever (or DRY)

Certainly, however:

> That's the point I'm making about reviewing LLM code: you are not on the hook for making it work

The second portion of your statement is either confusing (something unsaid) or untrue (you are still ultimately on the hook).

Agentic AI is just yet another, as you put it way to "get in trouble trying to be clever".

My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code. If your only real use of AI is to replace template systems, congratulations on perpetuating the most over-engineered template system ever. I'll stick with a provable, free template system, or just not write the code at all.

vidarh 5 hours ago [-]
> The second portion of your statement is either confusing (something unsaid) or untrue (you are still ultimately on the hook).

You're missing the point.

tptacek is saying he isn't the one who needs to fix the issue because he can just reject the PR and either have the AI agent refine it or start over. Or ultimately resort to writing the code himself.

He doesn't need to make the AI written code work, and so he doesn't need to spend a lot of time reading the AI written code - he can skim it for any sign it looks even faintly off and just kill it if that's the case instead of spending more time on it.

> My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code.

There's a vast chasm between simple enough that a non-AI code generator can generate it using templates and simple enough that a fast read-through is enough to show that it's okay to run.

As an example, the other day I had my own agent generate a 1kloc API client for an API. The worst case scenario other than failing to work would be that it would do something really stupid, like deleting all my files. Since it passes its tests, skimming it was enough for me to have confidence that nowhere does it do any file manipulation other than reading the files passed in. For that use, that's sufficient since it otherwise passes the tests and I'll be the only user for some time during development of the server it's a client for.

But no template based generator could write that code, even though it's fairly trivial - it involved reading the backend API implementation and rote-implementation of a client that matched the server.

smaudet 4 hours ago [-]
> But no template based generator could write that code, even though it's fairly trivial

Not true at all, in fact this sort of thing used to happen all the time 10 years ago, code reading APIs and generating clients...

> He doesn't need to make the AI written code work, and so he doesn't need to spend a lot of time reading the AI written code - he can skim it for any sign it looks even faintly off and just kill it if that's the case instead of spending more time on it.

I think you are missing the point as well, that's still review, that's still being on the hook.

Words like "skim" and "kill" are the problem here, not a solution. They point to a broken process that looks like its working...until it doesn't.

But I hear you say "all software works like that", well, yes, to some degree. The difference being, one you hopefully actually wrote and have some idea what's going wrong, the other one?

Well, you just have to sort of hope it works and when it doesn't, well you said it yourself. Your code was garbage anyways, time to "kill" it and generate some new slop...

vidarh 4 hours ago [-]
> Not true at all, in fact this sort of thing used to happen all the time 10 years ago, code reading APIs and generating clients...

Where is this template based code generator that can read my code, understand it, and generate a full client including a CLI, that include knowing how to format the data, and implement the required protocols?

I'm 30 years of development, I've seen nothing like it.

> I think you are missing the point as well, that's still review, that's still being on the hook.

I don't know if you're being intentionally obtuse, or what, but while, yes, you're on the hook for the final deliverable, you're not on the hook for fixing a specific instance of code, because you can just throw it away and have the AI do it all over.

The point you seem intent on missing is that the cost of throwing out the work of another developer is high, while the cost of throwing out the work of an AI assistant is next to nothing, and so where you need to carefully review a co-workers code because throwing it away and starting over from scratch is rarely an option, with AI generated code you can do that at the slightest whiff of an issue.

> Words like "skim" and "kill" are the problem here, not a solution. They point to a broken process that looks like its working...until it doesn't.

No, they are not a problem at all. They point to a difference in opportunity cost. If the rate at which you kill code is too high, it's a problem irrespective of source. But the point is that this rate can be much higher for AI code than for co-workers before it becomes a problem, because the cost of starting over is orders of magnitude different, and this allows for a very different way of treating code.

> Well, you just have to sort of hope it works and when it doesn't

No, I don't "hope it works" - I have tests.

kenjackson 8 hours ago [-]
I can read code much faster than I can write it.

This might be the defining line for Gen AI - people who can read code faster will find it useful and those that write faster then they can read won’t use it.

globnomulous 6 hours ago [-]
> I can read code much faster than I can write it.

I have known and worked with many, many engineers across a wide range of skill levels. Not a single one has ever said or implied this, and in not one case have I ever found it to be true, least of all in my own case.

I don't think it's humanly possible to read and understand code faster than you can write and understand it to the same degree of depth. The brain just doesn't work that way. We learn by doing.

kenjackson 26 minutes ago [-]
You definitely can. For example I know x86. I can read it and understand it quite well. But if you asked me to write even a basic program in it, it would take me a considerable amount of time.

The same goes with shell scripting.

But more importantly you don’t have to understand code to the same degree and depth. When I read code I understand what the code is doing and if it looks correct. I’m not going over other design decisions or implementation strategies (unless they’re obvious). If I did that then I’d agree. Id also stop doing code reviews and just write everything myself.

autobodie 7 hours ago [-]
I think that's wrong. I only have to write code once, maybe twice. But when using AI agents, I have to read many (5? 10? I will always give up before 15) PRs before finding one close enough that I won't have to rewrite all of it. This nonsense has not saved me any time, and the process is miserable.

I also haven't found any benefit in aiming for smaller or larger PRs. The aggregare efficiency seems to even out because smaller PRs are easier to weed through but they are not less likely to be trash.

kenjackson 7 hours ago [-]
I only generate the code once with GenAI and typically fix a bug or two - or at worst use its structure. Rarely do I toss a full PR.

It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.

dagw 6 hours ago [-]
It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.

It is 100% a function of what you are trying to build, what language and libraries you are building it in, and how sensitive that thing is to factors like performance and getting the architecture just right. I've experienced building functioning systems with hardly any intervention, and repeatedly failing to get code that even compiles after over an hour of effort. There exists small, but popular, subset of programming tasks where gen AI excels, and a massive tail of tasks where it is much less useful.

omnicognate 5 hours ago [-]
The problem is that at this stage we mostly just have people's estimates of their own success to go on, and nobody thinks they're incompetent. Nobody's going to say "AI works really well for, me but I just pump out dross my colleagues have to fix" or "AI doesn't work for me but I'm an unproductive, burnt out hack pretending I'm some sort of craftsman as the world leaves me behind".

This will only be resolved out there in the real world. If AI turns a bad developer, or even a non-developer, into somebody that can replace a good developer, the workplace will transform extremely quickly.

So I'll wait for the world to prove me wrong but my expectation, and observation so far, is that AI multiplies the "productivity" of the worst sort of developer: the ones that think they are factory workers who produce a product called "code". I expect that to increase, not decrease, the value of the best sort of developer: the ones who spend the week thinking, then on Friday write 100 lines of code, delete 2000 and leave a system that solves more problems than it did the week before.

autobodie 45 minutes ago [-]
My experiences so far suggest that you might be right.
stitched2gethr 7 hours ago [-]
Why would you review agent generated code any differently than human generated code?
tptacek 6 hours ago [-]
Because you don't care about the effort the agent took and can just ask for a do-over.
greybox 4 hours ago [-]
For simple tedious or rote tasks, I have templates bound to hotkeys in my IDE. They even come with configurable variable sections that you can fill in afterwards, or base on some highlighted code before hitting the hot key. Also, its free
112233 8 hours ago [-]
This is radical and healthy way to do it. Obviously wrong — reject. Obviously right — accept. In any other case — also reject, as non-obvious.

I guess it is far removed from the advertized use case. Also, I feel one would be better off having auto-complete powered by LLM in this case.

vidarh 5 hours ago [-]
Auto-complete means having to babysit it.

The more I use this, the longer the LLM will be working before I even look at the output any more than maybe having it chug along on another screen and occasionally glance over.

My shortest runs now usually takes minutes of the LLM expanding my prompt into a plan, writing the tests, writing the code, linting its code, fixing any issues, and write a commit message before I even review things.

bluefirebrand 8 hours ago [-]
> Obviously right — accept.

I don't think code is ever "obviously right" unless it is trivially simple

tptacek 8 hours ago [-]
I don't find this to be the case. I've used (and hate) autocomplete-style LLM code generation. But I can feed 10 different tasks to Codex in the morning and come back and pick out the 3-4 I think might be worth pursuing, and just re-prompt the 7 I kill. That's nothing like interactive autocomplete, and drastically faster than than I could work without LLM assistance.
monero-xmr 8 hours ago [-]
I mostly just approve PRs because I trust my engineers. I have developed a 6th sense for thousand-line PRs and knowing which 100-300 lines need careful study.

Yes I have been burned. But 99% of the time, with proper test coverage it is not an issue, and the time (money) savings have been enormous.

"Ship it!" - me

theK 7 hours ago [-]
I think this points out the crux of the difference of collaborating with other devs vs collaborating with am AI. The article correctly States that the AI will never learn your preferences or idiosyncrasies of the specific projects/company etc because it effectively is amnesic. You cannot trust the AI the same you trust other known collaborators because you don't have a real relationship with it.
loandbehold 6 hours ago [-]
Most AI coding tools are working on this problem. E.g. say with Claude Code you can add your preferences to claude.md file. When I notice repeatedly correcting AI's mistake I add instruction to claude.md to avoid it in the future. claude.md is exactly that: memory of your preferences, idiosyncrasies and other project-related info.
vidarh 5 hours ago [-]
I do something to the effect of "Update LLM.md with what you've learned" at the end of every session, coupled with telling it what is wrong when I reject a change. It works. It could work better, but it works.
autobodie 7 hours ago [-]
Haha, doing this with AI will bury you in a very deep hole.
cwoolfe 21 minutes ago [-]
I have found AI generated code to be overly verbose and complex. It usually generates 100 lines and I take a few of them and adapt them to what I want. The best cases I've found for using it are asking specific technical questions, helping me learn a new code language, having it generate ideas on how to solve a problem for brainstorming. It also does well with bounded algorithmic problems that are well specified i.e. write a function that takes inputs and produces outputs according to xyz. I've found it's usually sorely lacking in domain knowledge (i.e. it is not an expert on the iOS SDK APIs, not an expert in my industry, etc.)
mettamage 19 minutes ago [-]
My heuristic: the more you're solving a solved problem that is just tedious work and memory intensive take a crack at using AI. It will probably one shot your solution with minimal tweaks required.

The more you deviate from that, the more you have to step in.

But given that I constantly forget how to open a file in Python, I still have a use for it. It basically supplanted Stackoverflow.

roxolotl 11 hours ago [-]
> But interns learn and get better over time. The time that you spend reviewing code or providing feedback to an intern is not wasted, it is an investment in the future. The intern absorbs the knowledge you share and uses it for new tasks you assign to them later on.

This is the piece that confuses me about the comparison to a junior or an intern. Humans learn about the business, the code, the history of the system. And then they get better. Of course there’s a world where agents can do that, and some of the readme/doc solutions do that but the limitations are still massive and so much time is spent reexplaining the business context.

viraptor 9 hours ago [-]
You don't have to reexplain the business context. Save it to the mdc file if it's important. The added benefit is that the next real person looking at the code can also use that to learn - it's actually cool for having good up to date documentation is now an asset.
adastra22 7 hours ago [-]
Do you find your agent actually respecting the mdc file? I don’t.
viraptor 7 hours ago [-]
There should be no difference between the mdc and the text in the prompt. Try something drastic like "All of responses should be in Chinese". If it doesn't happen, they're not included correctly. Otherwise, yeah, they work modulo the usual issues of prompt adherence.
adastra22 6 hours ago [-]
I suspect that Cursor is summarizing the context window, and the .mdc directives are the first thing on the chopping room floor.
xarope 10 hours ago [-]
I think this is how certain LLMs end up with 14k worth of system prompts
Terr_ 9 hours ago [-]
"Be fast", "Be Cheap", "Be Good".

*dusts off hands* Problem solved! Man, am I great at management or what?

freeone3000 10 hours ago [-]
Put the business context in the system prompt.
pSYoniK 6 hours ago [-]
I've been reading these posts for the past few months and the comments too. I've tried Junie a bit and I've used ChatGPT in the past for some bash scripts (which, for the most part, did what they were supposed to do), but I can't seem to find the use case.

Using them for larger bits of code feels silly as I find subtle bugs or subtle issues in places, so I don't necessarily feel comfortable passing in more things. Also, large bits of code I work with are very business logic specific and well abstracted, so it's hard to try and get ALL that context into the agent.

I guess what I'm trying to ask here is what exactly do you use agents for? I've seen youtube videos but a good chunk of those are people getting a bunch of typescript generated and have some front-end or generate some cobbled together front end that has Stripe added in and everyone is celebrating as if this is some massive breakthrough.

So when people say "regular tasks" or "rote tasks" what do you mean? You can't be bothered to write a db access method/function using some DB access library? You are writing the same regex testing method for the 50th time? You keep running into the same problem and you're still writing the same bit of code over and over again? You can't write some basic sql queries?

Also not sure about others, but I really dislike having to do code reviews when I am unable to really gauge the skill of the dev I'm reviewing. If I know I have a junior with 1-2 years maybe, then I know to focus a lot on logic issues (people can end up cobbling toghether the previous simple bits of code) and if it's later down the road at 2-5 years then I know that I might focus on patterns or look to ensure that the code meets the standards, look for more discreet or hidden bugs. With an agent output it could oscilate wildly between those. It could be a solidly written search function, well optimized or it could be a nightmarish sql querry that's impossible to untangle.

Thoughts?

I do have to say I found it good when working on my own to get another set of "eyes" and ask things like "are there more efficient ways to do X" or "can you split this larger method into multiple ones" etc

danieltanfh95 10 hours ago [-]
AI models are fundamentally trained on patterns from existing data - they learn to recognize and reproduce successful solution templates rather than derive solutions from foundational principles. When faced with a problem, the model searches for the closest match in its training experience rather than building up from basic assumptions and logical steps.

Human experts excel at first-principles thinking precisely because they can strip away assumptions, identify core constraints, and reason forward from fundamental truths. They might recognize that a novel problem requires abandoning conventional approaches entirely. AI, by contrast, often gets anchored to what "looks similar" and applies familiar frameworks even when they're not optimal.

Even when explicitly prompted to use first-principles analysis, AI models can struggle because:

- They lack the intuitive understanding of when to discard prior assumptions

- They don't naturally distinguish between surface-level similarity and deep structural similarity

- They're optimized for confident responses based on pattern recognition rather than uncertain exploration from basics

This is particularly problematic in domains requiring genuine innovation or when dealing with edge cases where conventional wisdom doesn't apply.

Context poisoning, intended or not, is a real problem that humans are able to solve relatively easily while current SotA models struggle.

adastra22 7 hours ago [-]
So are people. People are trained on existing data and learn to reproduce known solutions. They also take this to the meta level—a scientist or engineer is trained on methods for approaching new problems which have yielded success in the past. AI does this too. I’m not sure there is actually a distinction here..
danieltanfh95 2 hours ago [-]
Of course there is. Humans can pattern match as a means to save time. LLM pattern match as the only mode of communication and “thought”.

Humans are also not as susceptible to context poisoning, unlike llms.

esailija 2 hours ago [-]
There is a difference between extrapolating from just a few examples vs interpolating between trillion examples
dvt 9 hours ago [-]
I'm actually quite bearish on AI in the generative space, but even I have to admit that writing boilerplate is "N" times faster using AI (use your favorite N). I hate when people claim this without any proof, so literally today this is what I asked ChatGPT:

    write a stub for a react context based on this section (which will function as a modal):
    ```
        <section>
         // a bunch of stuff
        </section>
    ```
Worked great, it created a few files (the hook, the provider component, etc.), and I then added them to my project. I've done this a zillion times, but I don't want to do it again, it's not interesting to me, and I'd have to look up stuff if I messed it up from memory (which I likely would, because provider/context boilerplate sucks).

Now, I can just do `const myModal = useModal(...)` in all my components. Cool. This saved me at least 30 minutes, and 30 minutes of my time is worth way more than 20 bucks a month. (N.B.: All this boilerplate might be a side effect of React being terrible, but that's beside the point.)

skydhash 1 hours ago [-]
For this case, i will probably lift off the example from the library docs. Or spend 5 minutes writing a bare implementation as it would be all I need at the time.

That’s an issue I have with generated code. More often, I start with a basic design that evolves based on the project needs. It’s an iterative process that can span the whole timeline. But with generated code, it’s a whole solution that fits the current needs, but it’s a pain to refactor.

Winsaucerer 9 hours ago [-]
This kind of thing is my main use, boilerplate stuff And for scripts that I don't care about -- e.g., if I need a quick bash script to do a once off task.

For harder problems, my experience is that it falls over, although I haven't been refining my LLM skills as much as some do. It seems that the bigger the project, the more it integrates with other things, the worse AI is. And moreover, for those tasks it's important for me or a human to do it because (a) we think about edge cases while we work through the problem intellectually, and (b) it gives us a deep understanding of the system.

ritz_labringue 3 hours ago [-]
AI is really useful when you already know what code needs to be written. If you can explain it properly, the AI will write it faster than you can and you'll save time because it is quick to check that this is actually the code you wanted to write. So "programming with AI" means programming in your mind and then using the AI to materialize it in the codebase.
Tzt 58 minutes ago [-]
Well, kinda? I often know what chunks / functions I need, but too lazy to think how to implement them exactly, how they should works inside. Yeah, you need to have overall idea of what you are trying to make.
frankc 11 hours ago [-]
I just don't agree with this. I am generally telling the model how to do the work according to an architecture I specify using technology I understand. The hardest part for me in reviewing someone else's code is understanding their overall solution and how everything fits together as it's not likely to be exactly the way I would have structured the code or solved the problem. However, with an LLM it generally isn't since we have pre-agreed upon a solution path. If that is not what is happening than likely you are letting the model get too far ahead.

There are other times when I am building a stand-alone tool and am fine wiht whatever it wants to do because it's not something I plan to maintain and its functional correctness is self-evident. In that case I don't even review what it's doing unless it's stuck. This is more actual vibe code. This isn't something I would do for something I am integrating into a larger system but will for something like a cli tool that I use to enhance my workflow.

ken47 11 hours ago [-]
You can pre-agree on a solution path with human engineers too, with a similar effect.
bigbuppo 9 hours ago [-]
Don't try to argue with those using AI coding tools. They don't interact well with actual humans, which is why they've been relegated to talking to the computer. We'll eventually have them all working on some busy projects to help with "marketing" to keep them distracted while the decent programmers that can actually work in a team environment can get back to useful work free of the terrible programmers and marketing departments.
wiseowise 8 hours ago [-]
> that can actually work in a team environment can get back to useful work free of the terrible programmers

Is that what you and your buddies talk about at two hour long coffee/smoke breaks while “terrible” programmers work?

bawana 45 minutes ago [-]
Is AI's relationship to knowledge the same as an index fund is to equities? Does the fact that larger and larger groups of people use AI result in more homogeneous and 'blindered' thinking?
kachapopopow 9 hours ago [-]
AI is a tool like any other, you have to learn to use it.

I had AI create me a k8s device plugin for supporting sr-iov only vGPU's. Something nvidia calls "vendor specific" and basically offers little to not support for in their public repositories for Linux KVM.

I loaded up a new go project in goland, opened up Junie, typed what I needed and what I have, went to make tea, came back, looked over the code to make sure it wasn't going to destroy my cluster (thankfully most operations were read-only), deployed it with the generated helm chart and it worked (nearly) first try.

Before this I really had no idea how to create device plugins other than knowing what they are and even if I did, it would have easily taken me an hour or more to have something working.

The only thing AI got wrong is that the virtual functions were symlinks and not directories.

The entire project is good enough that I would consider opensourcing it. With 2 more prompts I had configmap parsing to initialize virtual functions on-demand.

Drunkfoowl 9 hours ago [-]
[dead]
ukprogrammer 3 hours ago [-]
> “It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.”

There’s your issue, the skill of programming has changed.

Typing gets fast; so does review once robust tests already prove X, Y, Z correctness properties.

With the invariants green, you get faster at grokking the diff, feed style nits back into the system prompt, and keep tuning the infinite tap to your taste.

zmmmmm 11 hours ago [-]
I think there's a key context difference here in play which is that AI tools aren't better than an expert on the language and code base that is being written. But the problem is that most software isn't written by such experts. It's written by people with very hazy knowledge of the domain and only partial knowledge of the languages and frameworks they are using. Getting it to be stylistically consistent or 100% optimal is far from the main problem. In these contexts AI is a huge help, I find.
jpcrs 5 hours ago [-]
I use AI daily, currently paying for Claude Code, Gemini and Cursor. It really helps me on my personal toy projects, it’s amazing at getting a POC running and validate my ideas.

My company just had internal models that were mediocre at best, but at the beginning this year they finally enabled Copilot for everyone.

At the beginning I was really excited for it, but it’s absolutely useless for work. It just doesn’t work on big old enterprise projects. In an enterprise environment everything is composed of so many moving pieces, knowledge scattered across places, internal terminology, etc. Maybe in the future, with better MCP servers or whatever, it’ll be possible to feed all the context into it to make it spit something useful, but right now, at work, I just use AI as search engine (and it’s pretty good at it, when you have the knowledge to detect when it have subtle problems)

HPsquared 5 hours ago [-]
I think a first step for these big enterprise codebases (also applicable to documentation) is to collect it into a big ball and finetune on it.
didibus 8 hours ago [-]
You could argue that AI-generated code is a black box, but let's adjust our perspective here. When was the last time you thoroughly reviewed the source code of a library you imported? We already work with black boxes daily as we evaluate libraries by their interfaces and behaviors, not by reading every line.

The distinction isn't whether code comes from AI or humans, but how we integrate and take responsibility for it. If you're encapsulating AI-generated code behind a well-defined interface and treating it like any third party dependency, then testing that interface for correctness is a reasonable approach.

The real complexity arises when you have AI help write code you'll commit under your name. In this scenario, code review absolutely matters because you're assuming direct responsibility.

I'm also questioning whether AI truly increases productivity or just reduces cognitive load. Sometimes "easier" feels faster but doesn't translate to actual time savings. And when we do move quicker with AI, we should ask if it's because we've unconsciously lowered our quality bar. Are we accepting verbose, oddly structured code from AI that we'd reject from colleagues? Are we giving AI-generated code a pass on the same rigorous review process we expect for human written code? If so, would we see the same velocity increases from relaxing our code review process amongst ourselves (between human reviewers)?

materielle 7 hours ago [-]
I’m not sure that the library comparison really works.

Libraries are maintained by other humans, who stake their reputation on the quality of the library. If a library gets a reputation of having a lax maintainer, the community will react.

Essentially, a chain of responsibility, where each link in the chain has an incentive to behave well else they be replaced.

Who is accountable for the code that AI writes?

layer8 3 hours ago [-]
Would you use a library that was written by AI without anyone having supervised it and thoroughly reviewed the code? We are using libraries without checking its source code because of the human thought process and quality control that has gone into it, and existing reputation. Nobody would use a library that no one else has ever seen and whose source code no human has ever laid their eyes on. (Excluding code generated by deterministic vetted tools here, like transpilers or parser generators.)
bluefirebrand 8 hours ago [-]
> When was the last time you thoroughly reviewed the source code of a library you imported?

Doesn't matter, I'm not responsible for maintaining that particular code

The code in my PRs has my name attached, and I'm not trusting any LLM with my name

didibus 7 hours ago [-]
Exactly, that's what I'm saying. Commit AI code under its own name. Then the code under your name can use the AI code as a black box. If your code that uses AI code works as expected, it is similar to when using libraries.

If you consider that AI code is not code any human needs to read or later modify by hand, AI code is modified by AI. All you want to do is just fully test it, if it all works, it's good. Now you can call into it from your own code.

benediktwerner 6 hours ago [-]
I don't see what that does. The AI hardly cares about it's reputation and I also can't really blame the AI when my boss or a customer asks me why something failed, so what does committing under its name do?

I'm ultimately still responsible for the code. And unlike AI, library authors but their and their libraries reputation on the line.

adastra22 7 hours ago [-]
These days, I review external dependencies pretty thoroughly. I did not use to. This is because of AI slop though.
zacksiri 3 hours ago [-]
LLMs are relatively new technology. I think it's important to recognize the tool for what it is and how it works for you. Everyone is going to get different usage from these tools.

What I personally find is. It's great for helping me solve mundane things. For example I'm recently working on an agentic system and I'm using LLMs to help me generate elasticsearch mappings.

There is no part of me that enjoy making json mappings, it's not fun nor does it engage my curiosity as a programmer, I'm also not going to learn much from generating elasticsearch mappings over and over again. For problems like this, I'm happy to just let the LLM do the job. I throw some json at it and I've got a prompt that's good enough that it will spit out results deterministically and reliably.

However if I'm exploring / coding something new, I may try letting the LLM generate something. Most of the time though in these cases I end up hitting 'Reject All' after I've seen what the LLM produces, then I go about it in my own way, because I can do better.

It all really depends on what the problem you are trying to solve. I think for mundane tasks LLMs are just wonderful and helps get out of the way.

If I put myself into the shoes of a beginner programmer LLMs are amazing. There is so much I could learn from them. Ultimately what I find is LLMs will help lower the barrier of entry to programming but does not mitigate the need to learn to read / understand / reason about the code. Beginners will be able to go much further on their own before seeking out help.

If you are more experienced you will probably also get some benefits but ultimately you'd probably want to do it your own way since there is no way LLMs will replace experienced programmer (not yet anyway).

I don't think it's wise to completely dismiss LLMs in your workflow, at the same time I would not rely on it 100% either, any code generated needs to be reviewed and understood like the post mentioned.

mentalgear 1 hours ago [-]
From all my experience over several years, the best stance towards AI assisted development is: "Trust, but verify" (each change). Which is in stark contrast of brittle "vibe coding" (which might work for demos but nothing else).
Tzt 1 hours ago [-]
What do you mean several years, it became feasible like 6 months ago lol. No, gpt3.5 doesn't count, it's a completely useless thing.
ed_mercer 11 hours ago [-]
> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.

Hard disagree. It's still way faster to review code than to manually write it. Also the speed at which agents can find files and the right places to add/edit stuff alone is a game changer.

Winsaucerer 8 hours ago [-]
There's a difference between reviewing code by developers you trust, and reviewing code by developers you don't trust or AI you don't trust.

Although tbh, even in the worse case I think I am still faster at reviewing than writing. The only difference is though, those reviews will never have had the same depth of thought and consideration as when I write the code myself. So reviews are quicker, but also less thorough/robust than writing for me.

bluefirebrand 8 hours ago [-]
> also less thorough/robust than writing for me.

This strikes me as a tradeoff I'm absolutely not willing to make, not when my name is on the PR

sensanaty 4 hours ago [-]
I'm fast at reviewing PRs because I know the person on the other end and can trust that they got things correctly. I'll focus on the meaty, tricky parts of their PR, but I can rest assured that they matched the design, for example, and not have to verify every line of CSS they wrote.

This is a recipe for disaster with AI agents. You have to read every single line carefully, and this is much more difficult for the large majority of people out there than if you had written it yourself. It's like reviewing a Junior's work, except I don't mind reviewing my Junior colleague's work because I know they'll at least learn from the mistakes and they're not a black box that just spews bullshit.

__loam 10 hours ago [-]
You are probably not being thorough enough.
aryehof 6 hours ago [-]
These days, many programmers and projects are happy to leave testing and defect discovery to end users, under the guise of “but we have unit tests and CI”. That’s exacerbated when using LLM driven code with abandon.

The author is one who appears unwilling to do so.

Kiro 5 hours ago [-]
> I believe people who claim that it makes them faster or more productive are making a conscious decision to relax their quality standards to achieve those gains.

Yep, this is pretty much it. However, I honestly feel that AI writes so much better code than me that I seldom need to actually fix much in the review, so it doesn't need to be as thorough. AI always takes more tedious edge-cases into account and applies best practices where I'm much sloppier and take more shortcuts.

redhale 3 hours ago [-]
This line by the author, in response to one of the comments, betrays the core of the article imo:

> The quality of the code these tools produce is not the problem.

So even if an AI could produce code of a quality equal to or surpassing the author's own code quality, they would still be uninterested in using it.

To each their own, but it's hard for me to accept an argument that such an AI would provide no benefit, even if one put priority on maintaining high quality standards. I take the point that the human author is ultimately responsible, but still.

animex 7 hours ago [-]
I write mostly boilerplate and I'd rather have the AI do it. The AI is also slow, which is great, which allows me to run 2 or 3 AI workspaces working on different tickets/problems at the same time.

Where AI especially excels is helping me do maintenance tickets on software I rarely touch (or sometimes never have touched). It can quickly read the codebase, and together we can quickly arrive at the place where the patch/problem lies and quickly correct it.

I haven't written anything "new" in terms of code in years, so I'm not really learning anything from coding manually but I do love solving problems for my customers.

royal__ 11 hours ago [-]
I get confused when I see stances like this, because it gives me the sense that maybe people just aren't using coding tools efficiently.

90% of my usage of Copilot is just fancy autocomplete: I know exactly what I want, and as I'm typing out the line of code it finishes it off for me. Or, I have a rough idea of the syntax I need to use a specific package that I use once every few months, and it helps remind me what the syntax is, because once I see it I know it's right. This usage isn't really glamorous, but it does save me tiny bits of time in terms of literal typing, or a simple search I might need to do. Articles like this make me wonder if people who don't like coding tools are trying to copy and paste huge blocks of code; of course it's slower.

kibibu 11 hours ago [-]
My experience is that the "fancy autocomplete" is a focus destroyer.

I know what function I want to write, start writing it, and then bam! The screen fills with ghost text that may partly be what I want but probably not quit.

Focus shifts from writing to code review. I wrest my attention back to the task at hand, type some more, and bam! New ghost text to distract me.

Ever had the misfortune of having a conversation with a sentence-finisher? Feels like that.

Perhaps I need to bind to a hot key instead of using the default always-on setting.

---

I suspect people using the agentic approaches skip this entirely and therefore have a more pleasant experience overall.

atq2119 9 hours ago [-]
It's fascinating how differently people's brains work.

Autocomplete is a total focus destroyer for me when it comes to text, e.g. when writing a design document. When I'm editing code, it sometimes trips me up (hitting tab to indent but end up accepting a suggestion instead), but without destroying my focus.

I believe your reported experience, but mine (and presumably many others') is different.

skydhash 11 hours ago [-]
That usage is the most disruptive for me. With normal intellisense and a library you're familiar with, you can predict the completion and just type normally with minimal interruption. With no completion, I can just touch type and fix the errors after the short burst. But having whole lines pop up break that flow state.

With unfamiliar syntax, I only needs a few minutes and a cheatsheet to get back in the groove. Then typing go back to that flow state.

Typing code is always semi-unconscious. Just like you don't pay that much attention to every character when you're writing notes on paper.

Editing code is where I focus on it, but I'm also reading docs, running tests,...

nottorp 5 hours ago [-]
> The problem is that I'm going to be responsible for that code, so I cannot blindly add it to my project and hope for the best.

Responsability and "AI" marketing are two non intersecting sets.

karl11 9 hours ago [-]
There is an important concept alluded to here around skin in the game: "the AI is not going to assume any liability if this code ever malfunctions" -- it is one of the issues I see w/ self-driving cars, planes, etc. If it malfunctions, there is no consequence for the 'AI' (no skin in the game) but there are definitely consequences for any humans involved.
zengyue 3 hours ago [-]
I think it is more suitable for creation rather than modification, so when repeated attempts still don't work, I will delete it and let it rewrite, which often solves the problem.
edg5000 7 hours ago [-]
It's a bit like going from assembly to C++, except we don't have good rigid rules for high-level program specification. If we had a rigid "high-level language" to express programs, orders or magnitude more high-level than C++ and other, than we could maybe evaluate it for correctness and get 100% output reliability, perhaps. All the languages I picked up, I picked them up when they were at least 10 years old. I'm trying to use AI a bit these days for programming, but it feels like what it must have felt like using C++ when it just came available; promising but not usable (yet?) for most programming situations.
noiv 5 hours ago [-]
I've started to finish some abandoned half-ready side projects with Claude Pro on Desktop with filesystem MCP. Used to high quality code, it took me some time to teach Claude to follow conventions. Now it works like a charm, we work on a requirements.md until all questions are answered and then I let Claude go. Only thing left is convincing clients to embrace code assistents.
handfuloflight 11 hours ago [-]
Will we be having these conversations for the next decade?
wiseowise 7 hours ago [-]
It’s the new “I use Vim/Emacs/Ed over IDE”.
ken47 11 hours ago [-]
Longer.
adventured 11 hours ago [-]
The conversations will climb the ladder and narrow.

Eventually: well, but, the AI coding agent isn't better than a top 10%/5%/1% software developer.

And it'll be that the coding agents can't do narrow X thing better than a top tier specialist at that thing.

The skeptics will forever move the goal posts.

jdbernard 11 hours ago [-]
If the AI actually outperforms humans in the full context of the work, then no, we won't. It will be so much cheaper and faster that businesses won't have to argue at all. Those that adopt them will massively outcompetes those that don't.

However, assuming we are still having this conversation, that alone is proof to me that the AI is not that capable. We're several years into "replace all devs in six months." We will have to continue wait and see it try and do.

ukprogrammer 3 hours ago [-]
> If the AI actually outperforms humans in the full context of the work, then no, we won't. It will be so much cheaper and faster that businesses won't have to argue at all. Those that adopt them will massively outcompetes those that don't.

This. The dev's outcompeting by using AI today are too busy shipping, rather than wasting time writing blog posts about what ultimately, is a skill-issue.

wiseowise 7 hours ago [-]
> If the AI actually outperforms humans in the full context of the work, then no, we won't.

IDEs outperform any “dumb” editor in full context of work. You don’t see any less posts about “I use Vim, btw” (and I say this as Vim user).

afarviral 8 hours ago [-]
This has been my experience as well, but there are plenty of assertions here that are not always true, e.g. "AI coding tools are sophisticated enough (they are not) to fix issues in my projects" … but how do you know this if you are not constantly checking whether the tooling has improved? I think for a certain level of issue AI can tackle it and improve things, but there's only a subset of the available models and of a multitude of workflows that will work well, but unfortunately we are drowning in many that are mediocre at best and many like me give up before finding the winning combination.
layer8 3 hours ago [-]
You omitted “with little or no supervision”, which I think is crucial to that quote. It’s pretty undisputed that having an AI fix issues in your code requires some amount of supervision that isn’t negligible. I.e. you have to review the fixes, and possibly make some adjustments.
Zaylan 6 hours ago [-]
I've had a similar experience. These tools are pretty helpful for small scripts or quick utility code, but once you're working on something with a more complex structure and lots of dependencies, they tend to slow down. Sometimes it takes more effort to fix what they generate than to just write it myself.

I still use them, but more as a support tool than a real assistant.

6 hours ago [-]
fshafique 11 hours ago [-]
"do not work for me", I believe, is the key message here. I think a lot of AI companies have crafted their tools such that adoption has increased as the tools and the output got better. But there will always be a few stragglers, non-normative types, or situations where the AI agent is just not suitable.
lexandstuff 9 hours ago [-]
Maybe, but there's also some evidence that AI coding tools aren't making anyone more productive. One study from last year found that there was no increase in developer velocity but a dramatic increase in bugs.[1] Granted, the technology has advanced since this study, but many of the fundamental issues of LLM unreliability remain. Additionally, a recent study has highlighted the significant cognitive costs associated with offloading problem-solving onto LLMs, revealing that individuals who do so develop significantly weaker neural connectivity than those who don't [2].

It's very possible that AI is literally making us less productive and dumber. Yet they are being pushed by subscription-peddling companies as if it is impossible to operate without them. I'm glad some people are calling it out.

[1] https://devops.com/study-finds-no-devops-productivity-gains-...

[2] https://arxiv.org/abs/2506.08872

fshafique 8 hours ago [-]
One year ago I probably would've said the same. But I started dabbling with it recently, and I'm awed by it.
10 hours ago [-]
b0a04gl 9 hours ago [-]
clarity is exactly why ai tools could work well for anyone. they're not confused users , they know what they want and that makes them ideal operators of these systems. if anything, the friction they're seeing isn't proof the tools are broken, it's proof the interface is still too blunt. you can't hand off intent without structure. but when someone like uses ai with clean prompts, tight scope, and review discipline, the tools usually align. it's not either-or. the tools aren't failing them, they're underutilissed.
Aeolun 6 hours ago [-]
> The part that I enjoy the most about working as a software engineer is learning new things, so not knowing something has never been a barrier for me.

To me the part I enjoy most is making things. Typing all that nonsense out is completely incidental to what I enjoy about it.

block_dagger 10 hours ago [-]
> For every new task this "AI intern" resets back to square one without having learned a thing!

I guess the author is not aware of Cursor rules, AGENTS.md, CLAUDE.md, etc. Task-list oriented rules specifically help with long term context.

adastra22 7 hours ago [-]
Do they? I have found that with Cursor at least, the model very quickly starts ignoring rules.
stray 10 hours ago [-]
You can lead a horse to the documentation, but you can't make him think.
wiseowise 7 hours ago [-]
Think is means to an end, not the end goal.

Or are you talking about OP not knowing AI tools enough?

edg5000 7 hours ago [-]
A huge bottleneck seems the lack of memory between sessions, at least with Claude Code. Sure, I can write things into a text file, but it's not the same as having an AI actually remember the work done earlier.

Is this possible in any way today? Does one need to use Llama or DeepSeek, and do we have to run it on our own hardware to get persistence?

s_ting765 5 hours ago [-]
Author makes very good points. Someone has to be responsible for the AI generated code, and if it's not going to be you then no one should feel obligated to pull the auto-generated PR.
lvl155 3 hours ago [-]
Analogous to assembly, we need standardized AI language/styles.
joelthelion 7 hours ago [-]
I think it's getting clear that, in the current stage, Ai coding agent are mostly useful for people working either on small projects, or isolated new features. People who maintain a large framework find it less useful.
sagarpatil 9 hours ago [-]
What really baffles me is the claims from: Anthropic: 80% of the code is generated by AI OpenAI: 70-80% Google/Microsoft: 30%
root_axis 8 hours ago [-]
The use of various AI coding tools is so diffuse that there isn't even a practical way to measure this. You can be assured those numbers are more or less napkin math based on some arbitrary AI performance factor applied to the total code writing population of the company.
layer8 3 hours ago [-]
Microsoft and Google have the much larger and older code bases.
nojs 9 hours ago [-]
This does not contradict the article - it may be true, and yet not significantly more productive, because of the increased review burden.
euleriancon 9 hours ago [-]
> The truth that may be shocking to some is that open source contributions submitted by users do not really save me time either, because I also feel I have to do a rigorous review of them.

This truly is shocking. If you are reviewing every single line of every package you intend to use how do you ever write any code?

adastra22 7 hours ago [-]
That’s not what he said. He said he reviews every line of every pull request he receives to his own projects. Wouldn’t you?
abenga 9 hours ago [-]
You do not need to review every line of every package you use, just the subset of the interface you import/link and use. You have to review every line of code you commit into your project. I think attempting to equate the two is dishonest dissembling.
euleriancon 9 hours ago [-]
To me, the point the friend is making is, just like you said, that you don't need to review every line of code in a package, just the interface. The author misses the point that there truly is code that you trust without seeing it. At the moment AI code isn't as trustworthy as a well tested package but that isn't intrinsic to the technology, just a byproduct of the current state. As AI code becomes more reliable, it will likely become the case that you only need to read the subset of the interface you import/link and use.
bluefirebrand 8 hours ago [-]
This absolutely is intrinsic to the workflow

Using a package that hundreds of thousands of other people use is low risk, it is battle tested

It doesn't matter how good AI code gets, a unique solution that no one else has ever touched is always going to be more brittle and risky than an open source package with tons of deployments

And yes, if you are using an Open Source package that has low usage, you should be reviewing it very carefully before you embrace it

Treat AI code as if you were importing from a git repo with 5 installs, not a huge package with Mozilla funding

root_axis 8 hours ago [-]
> At the moment AI code isn't as trustworthy as a well tested package but that isn't intrinsic to the technology, just a byproduct of the current state

This remains to be seen. It's still early days, but self-attention scales quadratically. This is a major red flag for the future potential of these systems.

varispeed 1 hours ago [-]
What is the purpose of this article?
nilirl 4 hours ago [-]
The main claim made: When there's money or reputation to be lost, code requires the same amount of cognition; irrespective of who wrote the code, AI or not.

Best counter claim: Not all code has the same risk. Some code is low risk, so the risk of error does not detract from the speed gained. For example, for proof of concepts or hobby code.

The real problem: Disinformation. Needless extrapolation, poor analogies, over valuing anecdotes.

But there's money to be made. What can we do, sometimes the invisible hand slaps us silly.

freehorse 4 hours ago [-]
> Best counter claim: Not all code has the same risk. Some code is low risk, so the risk of error does not detract from the speed gained. For example, for proof of concepts or hobby code.

Counter counter claim for these use cases: when I do proof of concept, I actually want to increase my understanding of said concept at the same time, learn challenges involved, and in general get a better idea how feasible things are. An AI can be useful for asking questions, asking for reviews, alternative solutions, inspiration etc (it may have something interesting to add or not) but if we are still in the territory "this matters" I would rather not substitute the actual learning experience and deeper understanding with having an AI generate code faster. Similar for hobby projects, do I need that thing to just work or I actually care to learn how it is done? If the learning/understanding is not important in a context, I would say then using AI to generate the code is a great time-saver. Otherwise, I may still use AI but not in the same way.

nilirl 4 hours ago [-]
Fair. I rescind those examples and revise my counter: When you gain much more from speed than you lose with errors, AI makes sense.

Revised example: Software where the goal is design experimentation; like with trying out variations of UX ideas.

cinbun8 6 hours ago [-]
As someone who heavily utilizes AI for writing code, I disagree with all the points listed. AI is faster, a multiplier, and in many instances, the equivalent of an intern. Perhaps the code it writes is not like the code written by humans, but it serves as a force multiplier. Cursor makes $500 million for a reason.
skydhash 11 hours ago [-]
I do agree with these points in my situation. I don't actually care for speed or having generated snippets for unfamiliar domains. Coding for me has always be about learning. Whether I'm building out a new feature or solving a bug, programming is always a learning experience. The goal is to bring forth a solution that a computer can then perform, but in the process you learn about how and more importantly why you should solve a problem.

The concept of why can get nebulous in a corporate setting, but it's nevertheless fun to explore. At the end of the day, someone have a problem and you're the one getting the computer to solve it. The process of getting there is fun in a way that you learn about what irks someone else (or yourself).

Thinking about the problem and its solution can be augmented with computers (I'm not remembering Go Standard Library). But computers are simple machines with very complex abstractions built on top of them. The thrill is in thinking in terms of two worlds, the real one where the problem occurs and the computing one where the solution will come forth. The analogy may be more understandable to someone who've learned two or more languages and think about the nuances between using them to depict the same reality.

Same as the TFA, I'm spending most of my time manipulating a mental model of the solution. When I get to code is just a translation. But the mental model is difuse, so getting it written gives it a firmer existence. LLMs generation is mostly disrupting the process. The only way they help really is a more pliable form of Stack Overflow, but I've only used Stack Overflow as human-authored annotations of the official docs.

p1dda 10 hours ago [-]
It would be interesting to see which is faster/better in competitive coding, the human coder or the human using AI to assist in coding.
Snuggly73 7 hours ago [-]
New benchmark for competitive coding dropped yesterday - https://livecodebenchpro.com/

Apparently models are not doing great for problems out of distribution.

p1dda 5 hours ago [-]
It goes to show that the LLMs aren't intelligent in the way humans are. LLMs are a really great replacement for googling though
asciimov 9 hours ago [-]
It would only be interesting if the problem was truly novel. If the AI has already been trained on the problem it’ll just push out a solution.
wiseowise 7 hours ago [-]
It already happened. Last year AI submissions completely destroyed AoC, as far as I remember.
dpcan 9 hours ago [-]
This article is just simply not true for most people who have figured out how to use AI properly when coding. Since switching to Cursor, my coding speed and efficiency has probably increased 10x conservatively. When I'm using it to code in languages I've used for 25+ years, it's a breeze to look over the function it just saved me time by pre-thinking and typing it out for me. Could I have done it myself, yeah, but it would have taken longer if I even had to go lookup one tiny thing in the documentation, like order of parameters for a function, or that little syntax thing I never use...

Also, the auto-complete with tools like Cursor are mind blowing. When I can press tab to have it finish the next 4 lines of a prepared statement, or it just knows the next 5 variables I need to define because I just set up a function that will use them.... that's a huge time saver when you add it all up.

My policy is simple, don't put anything AI creates into production if you don't understand what it's doing. Essentially, I use it for speed and efficiency, not to fill in where I don't know at all what I'm doing.

amlib 8 hours ago [-]
What do you even mean with a 10x increase in efficiency? Does that means you commit 10x more code every day? Or that "you" essentially "type" code 10x faster? In the later case all the other tasks surrounding code would still take around the same netting you much less than 10x increase in overall productivity, probably less than 2x?
asciimov 9 hours ago [-]
Out of curiosity how much are you spending on AI?

How much do you believe a programmer needs to layout to “get good”?

epiccoleman 8 hours ago [-]
I am currently subscribed to Claude Pro, which is $20/mo and gives you plenty to experiment with by giving you access to Projects and MCP in Claude Desktop and also Claude Code for a flat monthly fee. (I think there are usage limits but I haven't hit them).

I've probably fed $100 in API tokens into the OpenAI and Anthropic consoles over the last two years or so.

I was subscribed to Cursor for a while too, though I'm kinda souring on it and looking at other options.

At one point I had a ChatGPT pro sub, I have found Claude more valuable lately. Same goes for Gemini, I think it's pretty good but I haven't felt compelled to pay for it.

I guess my overall point is you don't have to break the bank to try this stuff out. Shell out the $20 for a month, cancel immediately, and if you miss it when it expires, resub. $20 is frankly a very low bar to clear - if it's making me even 1% more productive, $20 is an easy win.

hooverd 10 hours ago [-]
The moat is that juniors, never having worked without these tools, provide revenue to AI middlemen. Ideally they're blasting their focus to hell on short form video and stimulants, and are mentally unable to do the job without them.
Terr_ 9 hours ago [-]
Given some the creeping appeal of LLMs as cheating tools in education, some of them may be arriving in the labor market with their brains already cooked.
bilalq 8 hours ago [-]
I've found "agents" to be an utter disappointment in their current state. You can never trust what they've done and need to spend so much time verifying their solution that you may as well have just done it yourself in the first place.

However, AI code reviewers have been really impressive. We run three separate AI reviewers right now and are considering adding more. One of these reviewers is kind of noisy, so we may drop it, but the others have been great. Sure, they have false positives sometimes and they don't catch everything. But they do catch real issues and prevent customer impact.

The Copilot style inline suggestions are also decent. You can't rely on it for things you don't know about, but it's great at predicting what you were going to type anyway.

nreece 8 hours ago [-]
Heard someone say the other day "AI coding is just advanced scaffolding right now." Made me wonder if we're expecting too much out of it, at-least for now.
nurettin 7 hours ago [-]
I simply don't like the code it writes. Whenever I try using llms, it is like wrestling for conciseness. Terrible code which is almost certainly 1/10 error or "extras" I don't need. At this point I am simply using it to motivate me to move forward.

Writing a bunch of orm code feels boring? I make it generate the code and edit. Importing data? I just make it generate inserts. New models are good at reformatting data.

Using a third party Library? I force it to look up every function doc online and it still has errors.

Adding transforms and pivots to sql while keeping to my style? It is a mess. Forget it. I do that by hand.

bdamm 11 hours ago [-]
No offense intended, but this is written by a guy who has the spare time to write the blog. I can only assume his problem space is pretty narrow. I'm not sure what his workflow is like, but personally I am interacting with so many different tools, in so many different environments, with so many unique problem sets, that being able to use AIs for error evaluation, and yes, for writing code, has indeed been a game changer. In my experience it doesn't replace people at all, but they sure are powerful tools. Can they write unsupervised code? No. Do you need to read the code they write? Yes, absolutely. Can the AIs produce bugs that take time to find? Yes.

But despite all that, the tools can find problems, get information, and propose solutions so much faster and across such a vast set of challenges that I simply cannot imagine going back to working without them.

This fellow should keep on working without AIs. All the more power to him. And he can ride that horse all the way into retirement, most likely. But it's like ignoring the rise of IDEs, or Google search, or AWS.

ken47 11 hours ago [-]
> rise of IDEs, or Google search, or AWS.

None of these things introduced the risk of directly breaking your codebase without very close oversight. If LLMs can surpass that hurdle, then we’ll all be having a different conversation.

stray 10 hours ago [-]
A human deftly wielding an LLM can surpass that hurdle. I laugh at the idea of telling Claude Code to do the needful and then blindly pushing to prod.
bdamm 11 hours ago [-]
This is not the right way to look at it. You don't have to have the LLMs directly coding your work unsupervised to see the enormous power that is there.

And besides, not all LLMs are the same when it comes to breaking existing functions. I've noticed that Claude 3.7 is far better at not breaking things that already work than whatever it is that comes with Cursor by default, for example.

wiseowise 7 hours ago [-]
Literally everything in this list, except AWS, introduces risk of breaking your code base without close oversight. Same people who copy paste LLM code into IDEs are yesterday’s copy paste from SO and random Google searches.
satisfice 11 hours ago [-]
You think he's not using the tools correctly. I think you aren't doing your job responsibly. You must think he isn't trying very hard. I think you are not trying very hard...

That is the two sides of the argument. It could only be settled, in principle, if both sides were directly observing each other's work in real-time.

But, I've tried that, too. 20 years ago in a debate between dedicated testers and a group of Agilists who believed all testing should be automated. We worked together for a week on a project, and the last day broke down in chaos. Each side interpreted the events and evidence differently. To this day the same debate continues.

worik 8 hours ago [-]
There are tasks I find AI (I use DeepSeek) useful for

I have not found it useful for large programming tasks. But for small tasks, a sort of personalised boiler plate, I find it useful

satisfice 11 hours ago [-]
Thank you for writing what I feel and experience, so that I don't have to.

Which is kind of like if AI wrote it: except someone is standing behind those words.

globnomulous 7 hours ago [-]
Decided to post my comment here rather than on the author's blog. Dang and tonhow, if the tone is too personal or polemical, I apologize. I don't think I'm breaking any HN rules.

Commenter Doug asks:

> > what AI coding tools have you utilized

Miguel replies:

> I don't use any AI coding tools. Isn't that pretty clear after reading this blog post?

Doug didn't ask what tools you use, Miguel. He asked which tools you have used. And the answer to that question isn't clear. Your post doesn't name the ones you've tried, despite using language that makes clear you that you have in fact used them (e.g. "my personal experience with these tools"). Doug's question isn't just reasonable. It's exactly the question an interested, engaged reader will ask, because it's the question your entire post begs.

I can't help but point out the irony here: you write a great deal on the meticulousness and care with which you review other people's code, and criticize users of AI tools for relaxing standards, but the AI-tool user in your comments section has clearly read your lengthy post more carefully and thoughtfully than you read his generous, friendly question.

And I think it's worth pointing out that this isn't the blog post's only head scratcher. Take the opening:

> People keep asking me If I use Generative AI tools for coding and what I think of them, so this is my effort to put my thoughts in writing, so that I can send people here instead of having to repeat myself every time I get the question.

Your post never directly answers either question. Can I infer that you don't use the tools? Sure. But how hard would it be to add a "no?" And as your next paragraph makes clear, your post isn't "anti" or "pro." It's personal -- which means it also doesn't say much of anything about what you actually think of the tools themselves. This post won't help the people who are asking you whether you use the tools or what you think of them, so I don't see why you'd send them here.

> my personal experience with these tools, from a strictly technical point of view

> I hope with this article I've made the technical issues with applying GenAI coding tools to my work clear.

Again, that word: "clear." No, the post not only doesn't make clear the technical issues; it doesn't raise a single concern that I think can properly be described as technical. You even say in your reply to Doug, in essence, that your resistance isn't technical, because for you the quality of an AI assistant's output doesn't matter. Your concerns, rather, are practical, methodological, and to some extent social. These are all perfectly valid reasons for eschewing AI coding assistants. They just aren't technical -- let alone strictly technical.

I write all of this as a programmer who would rather blow his own brains out, or retire, than cede intellectual labor, the thing I love most, to a robot -- let alone line the pockets of some charlatan 'thought leader' who's promising to make a reality of upper management's dirtiest wet dream: in essence, to proletarianize skilled work and finally liberate the owners of capital from the tyranny of labor costs.

I also write all of this, I guess, as someone who thinks commenter Doug seems like a way cool guy, a decent chap who asked a reasonable question in a gracious, open way and got a weirdly dismissive, obtuse reply that belies the smug, sanctimonious hypocrisy of the blog post itself.

Oh, and one more thing: AI tools are poison. I see them as incompatible with love of programming, engineering quality, and the creation of safe, maintainable systems, and I think they should be regarded as a threat to the health and safety of everybody whose lives depend on software (all of us), not because of the dangers of machine super intelligence but because of the dangers of the complete absence of machine intelligence paired with the seductive illusion of understanding.

sneak 9 hours ago [-]
It’s harder to read code than it is to write it, that’s true.

But it’s also faster to read code than to write it. And it’s faster to loop a prompt back to fixed code to re-review than to write it.

AlotOfReading 9 hours ago [-]
I've written plenty of code that's much faster to write than to read. Most dense, concise code will require a lot more time building a mental model to read than it took to turn that mental model into code in the first place.
Dusksky 2 hours ago [-]
[dead]
Mila-Cielo 10 hours ago [-]
[dead]
11 hours ago [-]
andrewstuart 8 hours ago [-]
He’s saying it’s not faster because he needs to impose his human analysis on it which is slow.

That’s fine, but it’s an arbitrary constraint he chooses, and it’s wrong to say AI is not faster. It is. He just won’t let it be faster.

Some won’t like to hear this, but no-one reviews the machine code that a compiler outputs. That’s the future, like it or not.

You can’t say compilers are slow because I add on the time I take to Analyse the machine code. That’s you being slow.

bluefirebrand 7 hours ago [-]
> no-one reviews the machine code that a compiler outputs

That's because compilers are generally pretty trustworthy. They aren't necessarily bug free, and when you do encounter compiler bugs it can be extremely nasty, but mostly they just work

If compilers were wrong as often as LLMs are, we would be reviewing machine code constantly

purerandomness 4 hours ago [-]
A compiler produces the same, deterministic output, every single time.

A stochastic parrot can never be trusted, let alone one that tweaks its model every other night.

I totally get that not all code ever written needs to be correct.

Some throw-away experiments can totally be one-shot by AI, nothing wrong with that. Depending on the industry one works in, people might be on different points of the expectation spectrum for correctness, and so their experience with LLMs vary.

It's the RAD tool discussion of the 2000s, or the "No-Code" tools debate of the last decade, all over again.

blueboo 5 hours ago [-]
Skeptics find Talking themselves out of trying them is marvellously effective for convincing themselves they’re right
strangescript 11 hours ago [-]
Everyone is still thinking about this problem the wrong way. If you are still running one agent, on one project at a time, yes, its not going to be all that helpful if you are already a fast, solid coder.

Run three, run five. Prompt with voice annotation. Run them when normally you need a cognitive break. Run them while you watch netflix on another screen. Have them do TDD. Use an orchestrator. So many more options.

I feel like another problem is deep down most developers hate debugging other people's code and thats effectively what this is at times. It doesn't matter if your Associate ran off and saved you 50k lines of typing, you would still rather do it yourself than debug the code.

I would give you grave warnings, telling you the time is nigh, adapt or die, etc, but it doesn't matter. Eventually these agents will be good enough that the results will surpass you even in simple one task at a time mode.

kibibu 11 hours ago [-]
I have never seen people work harder to dismantle their own industry than software engineers are right now.
marssaxman 9 hours ago [-]
We've been automating ourselves out of our jobs as long as we've had them; somehow, despite it all, we never run out of work to do.
kibibu 3 hours ago [-]
We've automated bullshit tedium work, like building and deploying, but this is the first time in my memory that people are actively trying to automate all the fun parts away.

Closest parallel I can think of is the code-generation-from-UML era, but that explicitly kept the design decisions on the human side, and never really took over the world.

strangescript 10 hours ago [-]
What exactly is the alternative? Wish it away? Developers have been automating away jobs for decades, its seems hypocritical to complain about it now.
hooverd 10 hours ago [-]
who gets the spoils?
sponnath 10 hours ago [-]
Can you actually demonstrate this workflow producing good software?
hooverd 11 hours ago [-]
Sounds like a way to blast your focus into a thousand pieces