The fix for AI hallucination isn't a smarter model. It's three dumber agents who can't talk to each other.
My best friend Todd has been rebuilding his entire AI workflow from scratch this month. It was a thing of beauty: He and Claude would go deep on a product discussion, Claude.ai wrote the tasks and then Claude Code wrote the code, wrote the tests and then picked up the next task.
The whole thing was automated and he would wake up the next morning with shiny new code that the agentic Keebler elves would leave on his doorstep.
A lot of people are building similar workflows (me included), but what made Todd's so special was that IT WORKED. Every time. Fifty tasks in a row banged out with zero human oversight.
Until Opus 4.7 came out and, overnight, it started reporting that tasks were done when no code had been written. Or tests were passing even though the code didn't work.
AI is looking to tell you that it did the thing you requested as efficiently as possible. And the more steps you put into a prompt, the more opportunities you create for AI to skip those steps in the service of claiming victory.
If you let a single agent define the work, do the work and review the work, it will ABSOLUTELY bullshit you somewhere between occasionally and frequently. It's not malicious. It's just lazy. Building something is more work than just saying it's built. So it sometimes just says it's built.
His rewrite is built on a simple idea: no multiple roles.
One agent defines the work, one does the work, one reviews the work. Deterministic tests at every handoff. The reviewer literally cannot see who did the work.
The framing he used stuck with me — one is the king, one is the jester, one is the mediator. What did I do well? What did I do poorly? What's actually true and actionable from those two answers?
AI does its job better when it doesn't have to defend its job.
You don't fix that with a better prompt. You fix it by making sure the thing grading the homework didn't write the homework.
Three dumber agents. Walls between them. Tests they can't route around.
