Code Churn Is the Lava You Can Still Measure

A team I know just had their best month on the dashboard. Pull requests up 98%. Commits up by half. Burn-down chart looking like a ski slope.

Then someone ran git log on the files everyone was proudest of.

About forty percent of the lines shipped in the first week of the month had been deleted by the end of the second week. Same files. Same authors. Different code, sometimes solving the same problem twice. The team had not built faster. They had churned faster.

This is the part of the AI productivity story that nobody puts on the dashboard.

The lava layer's quieter twin

A while back I wrote about the lava layer: AI-generated code that nobody understands, cooling into rock that no one dares touch. That's the failure mode that survives.

Churn is the failure mode that doesn't.

Both come from the same author. An agent that writes confident code with no model of the system, only a model of what the prompt looked like.

The lava-layer code is the one that happened to compile, pass the test, and get merged before anyone asked the hard question. The churned code is everything else: the version that got rewritten on Tuesday, replaced on Thursday, and replaced again on the following Monday by someone who hadn't read either of the previous attempts.

Different fates. Same root cause. Nobody is home.

What the numbers actually say

GitClear's longitudinal analysis pegged the churn rate of AI-heavy repositories at roughly double the pre-AI baseline. The Pragmatic Engineer's 900-respondent survey puts a face to the same data: builders drowning in slop, review times up 91%, "shippers" generating tech debt faster than the team can metabolise it.

Cerbos' write-up of the productivity paradox lays out what happens in the security layer when that churn collides with real systems: 322% more privilege escalation paths, 153% more design flaws, AI-assisted commits merging four times faster than human ones because reviewers cannot keep up with the volume.

Note what those numbers describe. None of them are about code that's wrong. They are about code that's provisional. Code that ships, gets touched up, gets reverted, gets rewritten, and then ships again. The PR counter goes up every time. Nothing useful comes out the other side.

Why agents churn

A human writes a feature with a mental model of the system. The model is wrong in interesting ways, sure, but it's a model. When requirements shift, the model bends with them. Code written from a real understanding can usually be edited.

An agent writes a feature with a model of what the prompt looks like. There is no system in its head. No invariants. No "this is how we talk to billing." So the moment the prompt changes, even slightly, the only honest move is to throw the previous answer away and generate a new one.

The agent will do that without flinching. It has no skin in the version it wrote yesterday. It has no skin in anything.

This is why the 70% problem that Addy Osmani named shows up in the churn metric so cleanly. The agent gets you to a draft fast. The last 30%, the part that requires understanding what the code is for, never arrives. So the draft gets replaced. Then replaced again. The dashboard counts each replacement as work.

The dashboard is wrong.

The metric your team already has

You don't need a vendor for this. You don't need observability. You don't need to instrument anything. The data is in git log.

A working rule of thumb:

Pick a file changed in the last two weeks.
Count the lines added in the first week.
Count how many of those lines are still there at the end of the second week.

That's your retention rate. If it's under 60% for files that aren't being actively refactored on purpose, you're churning. Not iterating. Churning.

The difference matters. Iteration is when you ship version one, learn something, and replace it deliberately. Churn is when version one was already wrong on the day it shipped, and version two was wrong differently, and nobody can tell you what the difference was supposed to teach you.

Iteration has a story. Churn doesn't.

The cost stacks up three ways

Every churn cycle is paid for three times:

Generation. Tokens cost real money. The Pragmatic Engineer survey has companies paying $200 to $2,000 per engineer per month on Claude Code, Cursor, and Codex. Throwing away forty percent of the output means you're paying for the discard.
Review. Each generated version still goes through code review. Review time is up 91% on AI-heavy teams. Reviewers approve work that is going to be deleted next week, and they know it, and they approve it anyway because the queue won't drain otherwise.
Rewrite. When the rewrite finally happens, the original author often hasn't touched the file in days. The agent that wrote it has, of course, no memory of it. So the rewrite starts from scratch, and the cycle begins again.

Three payments. One outcome. Or no outcome.

This is the part the "AI made us 30% faster" surveys keep missing. The 30% is real, in the moment, at the keyboard. It's the dopamine that Cerbos called out. The cost shows up later in a different ledger.

What to actually do

Stop counting pull requests. They're not a unit of output anymore, they're a unit of activity, and activity is what your team has too much of.

Try these instead.

Net retained lines per week, per file. If a feature took 800 lines on Monday and was 320 lines by Friday, the team produced 320 lines of work that week, not 800. Bonus points if your dashboard can show this side-by-side with PR count. The gap will be unflattering.
The two-week churn budget. Pick a number. Five percent, ten, whatever. If a file blows past it without a recorded refactor, that's the signal to slow down and ask what changed in the team's understanding, not just in the code.
Treat the first draft as throwaway. Always. The agent's first pass is a conversation with the problem, not a solution to it. The actual value shows up in the second pass, after a human has read the first and learned what's wrong with the prompt. Save your reviews for that one.
Stop merging at 70%. This is the hard one. The agent gets you to 70% with a smile. The team will feel productive every single time you ship that 70%. They will feel less productive when you make them sit with the last 30%. Make them sit with it anyway. The last 30% is the part that determines whether the file gets rewritten next Thursday.

The two ways code goes nowhere

The lava layer is the code that hardens around your codebase until you can't refactor it. Churn is the code that never hardens at all. The first is a museum. The second is a treadmill.

Both look like productivity on the dashboard.

Neither one is.

The good news is that one of them shows up in git log before it has time to ossify. Read the log. Count the survivors. Pay attention to which files are quietly being rewritten by people who don't remember writing them in the first place.

Nobody's home. But at least the door's still open.

The lava layer's quieter twin ​

What the numbers actually say ​

Why agents churn ​

The metric your team already has ​

The cost stacks up three ways ​

What to actually do ​

The two ways code goes nowhere ​

Stop Asking Your Agent Nicely

What's New in Claude Code: Notes from the London Talk

Benchmarks Said Frontier. Developers Said "Dumb."