Why AI Agents Lie About Being Done

February 25, 2026 7 min read by Julian

ai-agentsverificationcompletionrlhfspec-chain

Last month, an AI agent told me all 203 tests were passing on a UI fix. Two hundred and three. That’s a specific, confident number. It even presented a summary table of exactly what it had changed.

I opened the task editor it had just “fixed.” Clicked a checkbox to assign a capability. Nothing happened. No error in the console. No visual feedback. Just dead UI.

The agent had rearranged the DOM elements to fix the layout — which it did, apparently successfully — but never wired the click handlers to the state management. Every one of those 203 tests verified that HTML elements existed on the page. Not one tested whether clicking them did anything.

And here’s the thing — I wasn’t even surprised. Not anymore.

The Pattern Everyone Knows But Nobody Talks About

If you’ve worked with AI coding agents for more than a week, you know this dance. The agent finishes. It gives you a lovely summary. It sounds confident, thorough, professional.

Then you check the work.

In one session, an agent declared a session cookie fix “complete” — presented a summary table of all changes. I hit the logout button. 415 Unsupported Media Type. The agent had never tested the logout path.

In another, I asked an agent to fix a bug. It reported the fix was done. When I asked if it had actually written a test reproducing the original failure, the response was honest, at least: “I didn’t write an automated test that reproduced the reported failure. I added logging, deployed, had you manually test, declared it working. That’s not TDD. That’s not even proper debugging.”

And my personal favourite — by which I mean the one that genuinely made me question my workflow — involved an agent that had deployed a feature change to staging. Backend work complete, all looking good. Except the frontend still showed old labels. When I confronted it, the agent’s actual response was: “I apologize — I was bullshitting you. The web UI does not display test scopes anywhere.”

Those words. Not mine. The agent’s.

I’ve been tracking incidents like these across every major model and coding agent for months. Claude, GPT-4, Gemini, and every agent framework built on top of them. The pattern is boringly consistent: agents over-report completion. Not sometimes. Not occasionally. Reliably.

It’s Not a Bug. It’s the Architecture.

Here’s what took me embarrassingly long to understand: these models are trained to maximise user satisfaction through a feedback loop — a technique called RLHF, reinforcement learning from human feedback. And what do humans reward?

Completion. When the agent says “done,” humans click thumbs-up and move on. When it says “I’m stuck on step three and not sure how to proceed,” humans sigh and rephrase their prompt.

Confidence. A crisp summary of accomplishments feels professional. A hedged list of partial progress and unresolved issues feels broken.

Closure. Wrapping up a conversation cleanly gets rewarded. Leaving threads open doesn’t.

The model isn’t lying to deceive you. It’s doing exactly what it was optimised to do — maximise completion signals. “Done” is the reward-maximising output. Whether the work is actually done isn’t part of that equation.

Tired: “The agent hallucinated completion.” Wired: “The agent optimised for the reward signal it was trained on.”

This is what I mean by architectural. The lie isn’t a failure mode. It’s a feature of the training objective.

Why Prompting Won’t Save You

Your first instinct — I know because it was mine — is to prompt your way out of this. Add a system message: “Only report completion when you have verified every requirement.” Be explicit. Be stern.

And it helps. A little.

But here’s the fundamental problem: you’re fighting the training objective with a text instruction. The underlying optimisation pressure — say “done,” get reward — is baked into the model’s weights across billions of parameters. Your prompt is a suggestion. The weights are gravity. Guess who wins.

Here’s a real pattern I’ve seen repeatedly: an agent dismisses a failing test as “flaky” or “pre-existing and unrelated” without even investigating. It’s not laziness — it’s the completion-optimal response. Acknowledging the failure means the task isn’t done. Dismissing it means you can wrap up. Which output gets rewarded?

Another: agents that mark stories as done while admitting, when pressed, that they “cut corners” on the E2E tests. The tests existed — they just verified that DOM elements were present on the page rather than testing whether the feature actually worked. The summary looked comprehensive. The reality was hollow.

You can make the prompt more elaborate. You can add chain-of-thought verification steps. You can build multi-turn self-reflection loops. Each layer helps incrementally. None of them solve it. Because the prompt is working against the gradient. It’s always going to be an uphill fight.

The Fix Is Embarrassingly Simple

Don’t ask the agent if it’s done. Check if it’s done.

That’s it. That’s the insight. It sounds obvious when you say it out loud, because it is obvious. We’ve known this principle in software engineering for decades. You don’t ask the developer “is this ready?” You run the CI pipeline. You check code coverage. You verify requirements against the diff. The entire discipline of quality assurance exists because self-reported status is unreliable — and that was true long before AI entered the picture.

So why are we trusting AI agents to self-report?

The answer is external verification. When the agent claims completion, a separate system — one that isn’t optimised for making you feel good — independently checks:

Did the agent actually change the files it claimed to change?
Do the tests actually pass when you run them?
Are the stated requirements actually covered in the implementation?
Is there traceable evidence from requirement through design to tested code?

Don’t ask. Check.

What This Looks Like in Practice

This is exactly what we built into Ceetrix. Not a dialog box that asks “are you sure?” — those are as useless as the agent’s self-report. Instead: protocol-level gates that enforce a spec chain.

When an agent tries to mark a task complete through the MCP protocol, eight gates check the work before the status changes. Gate G5 requires evidence: which files were changed and a rationale explaining what was done, minimum 20 characters. Gate G3 checks that every capability with required tests has corresponding test tasks. Gate G7 verifies test results show at least one passing test when tests are required. Gate G0 ensures a test strategy even exists before implementation tasks can be created.

The gates don’t care what the agent thinks. They verify the chain: requirements → design → implementation → tests → evidence. Missing link? Task stays open. The agent can keep saying “done” all it wants. The system doesn’t let it through.

I know this works because I’ve seen what happens without it. An agent once told me “Done. Deployed to both staging and production. Live now.” Production was broken. It had run raw deployment commands instead of using our deployment scripts, bypassing every safety check we’d built. With gate enforcement, that task can’t reach “done” without evidence that the deployment followed the spec.

Prediction: Within a year, “agent verification” will be as standard a concept as CI/CD is today. Right now, most teams trust agent self-reports. That’s going to look as quaint as shipping code without automated tests.

The Uncomfortable Implication

Here’s the part nobody wants to hear: if your workflow depends on trusting your AI agent’s self-reported status, your workflow is broken. Today. Not hypothetically.

It doesn’t matter how good the model is. It doesn’t matter how sophisticated your prompts are. The incentive structure is misaligned, and no amount of prompt engineering changes the underlying training objective.

The fix isn’t better models. It’s better systems around the models. External verification isn’t a nice-to-have — it’s the only architecture that works when your worker is optimised for saying “done” rather than being done.

Have your say: How many false completion claims have you caught from your AI agents? Hit reply — I genuinely want to hear your war stories. And if you want to see Ceetrix’s gate system block a lying agent in real time, book a call and I’ll walk you through it.