Anthropic shipped Dynamic Workflows for Claude Code. From a single natural-language request, Claude writes a JavaScript orchestration script on the fly, and a runtime executes it in the background, spinning up dozens to hundreds of subagents in parallel. The hard cap is a thousand.
The number everyone quotes is a thousand. The interesting number is how few you actually need.
The case for it is real
I want to be fair to the tool before I take it apart, because the engineering underneath is good.
A single agent handed a big, vague task flails. It re-reads files it already read, re-plans work it already planned, and runs out of context halfway through. Dynamic Workflows fix this by making the orchestration deterministic. Claude writes the control flow once, in plain JavaScript, and the runtime runs it: loops, fan-out, verification, synthesis. The model decides the strategy. The script decides the execution.
The patterns it ships with are the good ones. pipeline() runs each item through every stage with no barrier between stages, so item A can be in verification while item B is still being found. Adversarial verify spawns independent skeptics whose job is to refute a finding, and kills it if the majority do. Judge panels generate competing attempts and score them.
If that sounds familiar, it should. It is the Architect, Fact-Checker, Devil's-Advocate trio made executable. The thing you used to do by hand, role-playing three perspectives in one chat, is now a script that runs them as separate agents that genuinely cannot see each other's reasoning. That independence is worth something. Anchoring is real, and a verifier that shares context with the thing it's verifying is not much of a verifier.
This is not a parrot trick. It is structure. Credit where it's due.
A minimal workflow, so you know what we're talking about
Here is the shape of one. Review a diff across two dimensions, then adversarially verify every finding before it survives.
export const meta = {
name: 'review-changes',
description: 'Review the diff across dimensions, verify each finding',
phases: [{ title: 'Review' }, { title: 'Verify' }],
}
const DIMENSIONS = [
{ key: 'bugs', prompt: 'Find correctness bugs in the current diff.' },
{ key: 'perf', prompt: 'Find performance regressions in the current diff.' },
]
const results = await pipeline(
DIMENSIONS,
d => agent(d.prompt, { phase: 'Review', schema: FINDINGS }),
review => parallel(review.findings.map(f => () =>
agent(`Try to refute this finding. Default to refuted if unsure: ${f.title}`,
{ phase: 'Verify', schema: VERDICT })
.then(v => ({ ...f, verdict: v })))),
)
const confirmed = results.flat().filter(Boolean).filter(f => f.verdict?.real)
return { confirmed }That's a real workflow. pipeline() is the default because it has no barrier. parallel() is a barrier: it waits for everything before returning, and you only want it when a later stage genuinely needs all of the earlier results at once, like deduplicating across the full set. The schema option forces each agent to return validated structured data instead of prose you have to parse. The whole thing runs in the background and hands you back confirmed.
Two dimensions and a verifier each. That's maybe six agents. It's useful, it's bounded, and it would cost you a few cents.
Now watch what happens when the bound comes off.
The part nobody adds up
Here is Anthropic's own warning, in the docs, in plain text: Dynamic Workflows "can consume substantially more tokens than a typical Claude Code session." Start small, they say. Well-scoped tasks first.
They are not being modest. The mechanism is brutal and it is not a bug. Every subagent runs in its own fresh context window. It does not inherit the parent's tokens. So the context gets resubmitted, per agent, every time. Three parallel workers roughly quadruple your spend versus doing the work serially. Now scale that to a hundred.
The horror stories are already in the wild. One developer ran a slash command that orchestrated 49 subagents in parallel for two and a half hours: estimated at eight to fifteen thousand dollars for a single session. A financial-services team left 23 subagents analysing code unattended and reported forty-seven thousand dollars over three days.
And here's the quiet default that turns a fan-out into a fire: every worker inherits the main session's model, which for most people is Opus. Opus is five dollars per million input tokens and twenty-five per million output. Haiku is a fifth of that. So you have a hundred workers paying Opus prices to do grep-shaped work that Haiku would finish for a rounding error. Nobody chose that. It's just what happens when you don't set the model per agent.
This is the token-saver tax inverted. There, the bill arrived at the door before any work happened. Here, the bill arrives during the work, scaling with how lazily you reached for the fan-out. Same lesson, opposite end: the cost you can see is rarely where the cost actually is.
Speed, efficiency, quality: pick the trade on purpose
The fan-out buys you wall-clock time. That is the whole pitch, and it is a fair one. The slowest single chain finishes and you're done, instead of waiting for a hundred tasks in series.
But wall-clock is not the only axis, and treating it as if it were is the new cargo cult.
| You optimise for | You get | You pay |
|---|---|---|
| Speed | Answer in minutes, not hours | Tokens scale with concurrency, not with work |
| Efficiency | Lowest bill for the result | One agent, slower, model matched to the task |
| Quality | Independent verification, less anchoring | More plausible output landing on your desk to review |
Notice the quality row, because it is the one that gets oversold. More agents do not give you more correctness. They give you more plausible output, faster, with a verification layer that catches some of the obvious wrongness. The independent skeptics are genuinely useful. But the survivors still land on your desk, and you still have to read every diff like you wrote it. A thousand agents do not change that. They just generate the reviewing backlog faster than you can clear it.
The bottleneck never left. It moved. It used to be "can the tool do this." Now the tool can do almost anything you can describe, and the binding constraint is whether you can still understand and review what it handed back. Fan-out is a throughput multiplier on a step that was never your throughput problem.
What to actually do
- Scope the fan-out before you reach for it. A two-dimension review needs six agents, not sixty. Anthropic's own agent-teams guidance says start with three to five and that "three focused teammates often outperform five scattered ones." Believe them. The diminishing returns are steep.
- Set the model per phase. Planner on Opus, workers on Haiku or Sonnet. This is one line per agent and it is the single biggest lever on your bill. Audit
CLAUDE_CODE_SUBAGENT_MODELwhile you're at it, because it silently overrides routing. - Pipeline over barrier. Use
parallel()only when a stage truly needs all prior results at once. Otherwise the barrier wastes the fast workers' time waiting on the slow one. - Verify adversarially, but proportionally. Three skeptics on a real architectural claim, fine. Five-vote consensus on a typo is theatre you're paying Opus for.
- Watch the meter, not the wall-clock. If a task is cheap to run serially and you don't need the answer in the next five minutes, run it serially. Tokenmaxxing is measuring the wrong thing when the thing you actually care about is the result.
Dynamic Workflows are a genuinely good tool, and I'll keep using them for the work they fit: wide research, broad audits, migrations too big to hold in one context. Those are real. The fan-out earns its bill there.
What changed is not the ceiling on what the tool can do. That ceiling was never the problem. What changed is that speed is now free and judgement still isn't. The tool will happily spend a thousand agents on a question that needed six. Deciding which one it was is the job. It always was.