From Blind Generation to an AI Team: How to Take Back Control with Agents

Most developers use AI the same way: type a prompt, skim the output, decide it looks right, commit.

That third step — "decide it looks right" — is where things quietly fall apart. You're evaluating AI output against a vague mental model of what you asked for. The model is equally confident whether it got it right or not. And you're the only reviewer of something you didn't write.

The fix isn't more careful prompting. It's a different process entirely.

Roles, not prompts

The idea is simple. Instead of asking AI what to build, you ask it to think about what to build — from multiple conflicting angles before any code exists.

You define three roles. Each gets a distinct mandate. Each has a different reason to push back.

The Architect designs the solution before a single line is written. Its output is not code — it's a map. Components, interfaces, invariants, trade-offs, risks. A structure to argue about.

The Fact Checker interrogates that map. Does the library referenced actually exist? Is the performance claim realistic at scale? Are there known vulnerabilities in this approach? It doesn't fix anything — it reports what it found.

The Devil's Advocate reads the design and the Fact Checker's notes, then tries to break it. Failure modes, edge cases, adversarial inputs, maintenance nightmares eighteen months from now. Its job is to make the strongest possible case that the design will fail.

Three perspectives. One requirement. Zero lines of production code written yet.

One prompt to run them all

You don't need three separate sessions or an orchestration layer. A single prompt handles this. Instruct the model to play all three roles in sequence — it will, producing each section in turn.

You will analyse the following requirements by playing three distinct roles in sequence.
Complete each role fully before moving to the next. Do not write implementation code at any point.

---

ROLE 1 — THE ARCHITECT
Produce a design document covering:
1. Components and their responsibilities
2. Interfaces between them (inputs, outputs, contracts)
3. Invariants that must hold under all conditions
4. Trade-offs and what is explicitly being accepted
5. The top three risks and how you would mitigate them

---

ROLE 2 — THE FACT CHECKER
Review the design above and verify every factual claim:
- Do referenced libraries and APIs actually exist and behave as described?
- Are performance characteristics realistic?
- Are there known vulnerabilities in any approach described?
- Are there implicit assumptions stated as facts?
Report each issue with: the claim, why it's suspect, what would confirm or refute it.

---

ROLE 3 — THE DEVIL'S ADVOCATE
Now argue against the design as forcefully as possible:
- Every failure mode you can imagine, including unlikely ones
- What happens when assumptions are violated (load spikes, malformed inputs, third-party outages)
- How an attacker would approach this design
- What this looks like to a new engineer 18 months from now
- Conditions under which this design would need to be completely replaced

---

Requirements:
[YOUR REQUIREMENTS HERE]

One response. Three structured perspectives. For the vast majority of features, this is all you need.

Going parallel with Claude

When stakes are higher — a new auth flow, a payment integration, anything security-critical — it's worth splitting the roles into separate agent calls.

The key insight: the Fact Checker and the Devil's Advocate don't depend on each other. Both need the Architect's output, but neither needs to wait on the other. With Claude's API, you fire them simultaneously the moment the Architect is done. Two parallel requests, two reports back at the same time.

          Architect
         /         \
        ▼           ▼
  Fact Checker  Devil's Advocate
  (parallel)    (parallel)
        \         /
         ▼       ▼
        You review all three

This is also where different models per role starts to matter. If both agents share the same context window, they carry the same biases from role to role. The Fact Checker implicitly knows what the Architect was thinking. The Devil's Advocate might pull its punches on a design it just produced. Genuine independence requires genuine separation.

What you do with the output

All three documents land in front of you. You read them. You make decisions.

That's it. That's the human's role — and it's non-negotiable.

What changes is the quality of information you're deciding on. Instead of reviewing generated code against a vague memory of what you asked for, you're looking at a structured argument: a design, its verification, and a rigorous attack on it. The decision is still yours. But you're making it with better inputs than you've ever had before.

The agents don't answer your question. They make sure you're asking the right one.

When to use this

Not for everything. A utility function, a standard pattern you've implemented twenty times, a well-understood CRUD endpoint — just prompt and review it yourself. The overhead isn't worth it.

But for anything where being wrong has real consequences: use the roles. A few minutes of structured analysis before you write a line of code will save you hours of untangling the wrong implementation later.

The single-prompt version costs almost nothing. The parallel Claude setup costs a few extra seconds of latency. Neither of those is a reason not to use it when the stakes justify it.

Match the process to the risk. That's the only rule.

Roles, not prompts ​

One prompt to run them all ​

Going parallel with Claude ​

What you do with the output ​

When to use this ​

The bureaucracy of bots: Why we are checking the checker

The Lava Layer: Why AI Code is Slowly Petrifying Your Codebase

The Brilliant Parrot Problem: What AI Actually Does When It 'Thinks'

Roles, not prompts

One prompt to run them all

Going parallel with Claude

What you do with the output

When to use this