·4m read time·780 words·

Tokenmaxxing is what happens when you measure the wrong thing

Amazon built a leaderboard for who burns the most AI tokens. Employees gamed it. The bills exploded. Uber's CTO admitted there is no link yet between all that spending and actually shipping products. This was always going to happen.

Jensen Huang told Nvidia's engineers they should spend AI tokens worth at least half their annual salary every year. Not doing so, he said, means you are not being productive. He called managers who discouraged AI use "insane."

The predictable thing happened.

Employees started using AI for everything. Including the things they did not need AI for.

What tokenmaxxing actually is

Amazon built an internal leaderboard called "Claudeonomics" tracking which employees burned the most tokens. Meta had a similar dashboard. The implicit message: the person at the top of the leaderboard is the most productive person in the building.

Team members admitted to using AI for unnecessary tasks to inflate their usage scores.

That is tokenmaxxing. Not vibe coding. Not agentic automation. Just gaming the metric you were handed, in exactly the way humans have always gamed metrics.

This is not a character flaw. It is an incentive system doing exactly what incentive systems do.

Goodhart's Law, wearing a hoodie

Charles Goodhart was a British economist. In 1975 he observed something that should be obvious by now: when a measure becomes a target, it ceases to be a good measure.

You have seen this before. You have been this before.

Velocity points that stopped meaning anything the moment the team learned to estimate large. Code coverage that climbed while test quality fell. Incident response times that hit SLA targets because engineers learned which alerts to close first.

"AI adoption rate" is just the latest iteration. You measure AI usage. Employees produce AI usage. You get the number you wanted and none of the outcome you needed.

The bill arrived

Microsoft cancelled most Claude Code licences six months after rolling them out to thousands of employees. Uber's CTO reported the firm had burned through its entire 2026 AI coding tools budget in four months. An unnamed company accidentally spent $500 million on Claude in a single month, having forgotten to set usage limits.

Then Uber's CTO said something worth writing down: there is "no link yet between AI tokenmaxxing and shipping successful products."

Read that again. A company that burned through an annual AI budget in four months cannot point to evidence that it made their products better.

That is not a cost problem. That is a measurement problem.

The Jevons trap

Here is the part where "tokens will get cheaper" does not rescue you.

William Stanley Jevons noticed in 1865 that more efficient steam engines led to more coal consumption, not less. Efficiency lowered the cost per unit. Consumption responded by growing faster than the cost fell.

AI is running the same playbook. Token prices are falling. Agentic AI tasks use up to 1000 times more tokens than a standard LLM query. Goldman Sachs forecasts a 24-fold increase in token demand by 2030, driven almost entirely by agent adoption.

Cheaper tokens plus more agents plus tokenmaxxing incentives equals a bill that surprises you every quarter.

What you should be measuring

The organisations that will actually benefit from AI are not the ones that maximise AI usage. They are the ones that measure what AI changes about the work.

That means:

  • Deployment frequency: Are features shipping faster? By how much?
  • Defect escape rate: Are fewer bugs reaching production?
  • Review cycle time: How long from PR to merge?
  • Time on rework: Are engineers spending less time fixing things they already built?

None of these are tokens. None of these can be gamed by asking AI to summarise a Slack thread you already read.

If your AI adoption programme cannot point to a single one of those numbers moving in the right direction, you do not have an AI adoption programme. You have an expensive vanity metric and a leaderboard.

The honest admission

The Uber CTO's statement is the most useful thing a senior technical leader has said about AI this year, precisely because it was not trying to be useful. They were not writing a think piece. They were explaining a pullback. And in doing so they said out loud what every honest engineer already suspects: we do not actually know if this is working.

That uncertainty is fine. Measuring a genuinely new tool properly takes time. What is not fine is building a target around usage volume and then acting surprised when usage volume is all you get.

You do not have an AI problem. You have a process problem. The same logic applies here. AI does not fix a broken measurement culture. It gives that culture something new to miscount.

The benchmark is not tokens. The benchmark is whether things ship, whether they hold, and whether the engineers behind them still understand what they built.

Everything else is Claudeonomics.

// series: The AI Skeptic(19 of 19)