BETA Ceetrix is free during beta — get started now

Why Your AI Agent Keeps Making the Same Mistakes

Tuesday: I spent twenty minutes explaining to Claude Code that our API routes use kebab-case, not camelCase. It understood. Fixed every endpoint. Clean diff, correct result.

Wednesday: Same agent. Same codebase. Brand new session. First task: add a new API route. In camelCase.

I didn’t even react. I just fixed it. Again. Because this wasn’t the first time. Or the fifth. It was the hundredth.

“I saw agents make the same mistakes again and again, in Golang, assembly, JavaScript and Python.” That’s not me — that’s a Hacker News commenter who clearly had the same dead-eyed stare I was developing. The model doesn’t remember. It can’t. Every session starts from zero.

The Amnesia Problem

Here’s the thing nobody tells you when you start using AI coding agents: there is no learning curve. Not for the agent, anyway.

You learn. You learn the agent’s quirks, its failure modes, which prompts work and which don’t. You accumulate knowledge session after session. The agent doesn’t. Every single session, it starts fresh. No memory of the bugs it introduced yesterday. No memory of the patterns you corrected. No memory of the architectural decisions you explained in painstaking detail.

“It got a lot dumber over time,” wrote one HN commenter. But it didn’t get dumber. It was always the same. What changed was your expectation — you assumed it was learning because you were investing in teaching it. It wasn’t. You were pouring water into a bucket with no bottom.

“Four back-and-forth messages just to get code that follows my project’s patterns.” That’s from a Dev.to article. Four messages. Every. Single. Session. Multiply that by every developer on your team, every day, and you start to see the real cost of amnesia.

Why This Is Different From the Regression Problem

In my last piece on the regression loop, I talked about agents breaking things they shouldn’t touch. The amnesia problem is the inverse — agents not knowing things they should know. The regression loop is about missing traceability. Amnesia is about missing memory.

They compound each other. An agent with no memory of past corrections will re-introduce the same regression patterns. An agent with no traceability will break things for the same structural reasons, session after session. Together, they create a frustration spiral that I’ve watched drain the enthusiasm out of developers who were genuinely excited about AI-assisted coding.

“We’re back to square one.” Five words from a Dev.to article that capture the entire experience.

CLAUDE.md Isn’t the Answer (But It’s Not Nothing)

Before you say it — yes, I know about CLAUDE.md. And .cursorrules. And system prompts. And every other mechanism for injecting persistent instructions into an agent session.

They help. I use them. But let’s be honest about what they are: a text file that the agent might follow. We covered this in the last piece — prompting works against the gradient. You’re fighting the model’s training objective with text instructions. The agent reads your CLAUDE.md, says “got it,” and then proceeds to use camelCase anyway because the training data is full of camelCase examples.

More fundamentally, CLAUDE.md scales poorly. You can’t encode every correction, every edge case, every “we tried that and it didn’t work” in a static instructions file. The file grows until the agent ignores half of it due to context window pressure. And it’s manual — you have to notice the mistake, you have to write the rule, you have to maintain the file. That’s not the agent learning. That’s you doing the learning and transcribing it for an audience that forgets between readings.

What If Corrections Compounded?

Here’s the question that changed how I think about this: what if every mistake taught the system something permanent?

Not the model — I’m not talking about fine-tuning. I mean the system around the model. What if, when you correct the agent (“use kebab-case, not camelCase”), that correction is captured as a structured artefact? Not a line in a text file. A first-class object that gets fed back into guidance the next time a relevant task comes up.

The difference is subtle but crucial. A CLAUDE.md rule says “use kebab-case.” A captured correction says “on March 4, the agent used camelCase for the /user-settings route, was corrected, and the fix involved renaming to kebab-case.” The correction carries context — when it happened, what the mistake was, what the fix looked like.

Over time, these corrections compound. Session 1: agent makes the kebab-case mistake. Session 2: the correction fires, agent gets it right. Session 50: the system has accumulated hundreds of corrections covering naming conventions, error handling patterns, test structure, deployment procedures — all learned from actual mistakes, not hypothetical rules.

Quality goes up over time instead of staying flat. That’s the curve you want.

How We Built This

Ceetrix captures corrections at the protocol level. When you give the agent corrective feedback — “that’s wrong,” “use X instead,” “you broke Y” — the system detects the correction pattern and stores it as a structured object with the original context, the mistake, and the fix.

These corrections aren’t dumped into a text file. They’re indexed by relevance and surfaced when the agent is working on related tasks. The agent adding a new API route? The kebab-case correction surfaces automatically. The agent writing tests? Past corrections about test structure and coverage patterns appear in the guidance.

This isn’t magic. It’s the same principle behind every bug tracker and post-mortem process in software engineering. You record what went wrong and why, so it doesn’t happen again. We’ve just applied it to AI agent workflows where, frankly, it’s needed more than anywhere else — because the agent literally cannot remember on its own.

The result, after running this for months: the same categories of mistakes that used to repeat every session now get caught before they happen. The agent still doesn’t have memory. But the system around it does. And that turns out to be enough.

The Bigger Picture

Here’s what I think the industry is slowly realising: the models are powerful enough. They’ve been powerful enough for a while. What’s missing isn’t intelligence — it’s infrastructure.

Context persistence. Correction capture. Traceability. Verification. These are boring, unsexy, engineering-discipline problems. Nobody’s writing breathless blog posts about correction capture systems. But they’re the difference between an AI coding experience that degrades over time and one that improves.

The amnesia problem isn’t a temporary limitation waiting for the next model release to fix. It’s a design characteristic of session-based AI. You can wait for models that somehow maintain perfect memory across infinite sessions. Or you can build systems that make memory the responsibility of the infrastructure, not the model.

I know which bet I’m making.


Have your say: How much time do you spend re-teaching your AI agent things it should already know? I’m genuinely curious — reply with your estimate. And if you want to see correction capture in action, try Ceetrix.