·5m read time·965 words·

The ceiling is made of concrete

Every rate limit, signup pause, and pricing shift of the last six months has one root cause. Not greed. Not unsustainable burn rates. Physics.

You hit a rate limit on Claude Code last week. Or GitHub Copilot paused new signups. Or Anthropic quietly started showing you a higher tier during checkout. Or Cursor started charging per token instead of per message.

The developer community called it greed. ThePrimeagen called it the end of the subsidy era. Both of those explanations feel true and both of them miss the point.

Theo put it more precisely: "It's not just money, it's about compute."

He's right. But even that undersells it.

What ThePrimeagen got right

The AI tools you've been using are subsidised products. Subscription prices were set to win users, not to cover costs. A $20/month Claude Code subscription was never a real price. It was a land-grab price. The bet was that once you built a workflow around the tool, you'd pay whatever the real price turned out to be.

That bet is now being called in.

Anthropic ran what looked like an upsell experiment: surfacing a higher tier to gauge whether Claude Code users would pay more. Theo's read on this is better than the obvious one. It wasn't consumer monetisation. It was compute preservation. Consumer subscriptions were draining GPU cycles Anthropic needed for enterprise clients, the contracts that actually cover their infrastructure costs.

Cursor abandoned message-based billing in mid-2024. One long agentic run could cost more than the entire monthly subscription. GitHub Copilot paused new signups. OpenAI has done the same thing before. The pattern is always the same: demand hit capacity.

Theo's quote cuts to it: "They don't care about you. All they care about is their enterprise customers that they make actual money off of. And they don't have enough Nvidia graphics cards in their fucking server farms to afford those customers and to sell them what makes them actual money."

The flat subscription model was never going to survive agentic use. One message that triggers an hours-long tool chain can cost $60 in inference. A $40/month plan subsidising $46,000 worth of compute, which Theo has actually done, is not a business model. It's a promotion.

The layer beneath the GPU

Here's where it gets less abstract.

The discourse about AI costs treats GPUs as the constraint. But GPUs need to be plugged in. Data centres need electricity. Electricity comes from a grid. And the grid is the actual ceiling.

In January 2026, Google told Reuters that connecting to the US transmission grid had become their main obstacle to data centre expansion. Wait times in some regions: over ten years. One provider reported a twelve-year wait just to begin a grid connection study.

This is not a procurement problem. You cannot fix a twelve-year grid queue by raising a Series E.

In the United Kingdom, connection wait times reach twelve to fifteen years in some areas. In Europe, Amsterdam and Dublin have already paused new grid connections entirely. Globally, the JLL 2026 data centre outlook puts average connection wait times in major markets at over four years, with utilisation at 97%.

The capacity numbers are equally stark. For 2026, the industry planned 16 GW of new data centre capacity. Only 5 GW is actually under construction. Nearly half of announced US capacity, around 7 GW, has been cancelled or delayed. Not because investors lost interest. Because the physical infrastructure to power it doesn't exist on any near-term timeline.

It is not the chips that are scarce. It is the transformers, the switchgear, the substations. Lead times on high-voltage infrastructure: three to five years. Google is now siting data centres next to power plants rather than waiting for the grid to reach them.

The GPU runs your prompt. The grid powers the GPU. And the grid doesn't iterate on a six-week release cycle.

What this means for anyone building on AI

The tooling you depend on is downstream of physical infrastructure constrained by permitting queues, copper shortages, and electricity planning timescales measured in decades.

That's not a reason to stop using the tools. It's a reason to have a clear-eyed picture of what you're actually building on.

Flat subscription pricing for AI tools was always a fiction. The cost of an agentic workflow is genuinely variable: which model, how many steps, how much context, at what time of day. Pricing that obscures this is pricing that will change. It already is changing.

Usage-based models aren't a betrayal. They're just the actual shape of the cost becoming visible.

The AI companies that win long-term are the ones closest to dedicated compute. That's why Anthropic's enterprise contracts matter more than your Max plan. That's why your 10x developer is gated by your pipeline, and now, behind that pipeline, by a substation.

What to actually do

Know what your tools actually cost to run. If you're on a flat plan and running heavy agentic workflows, you're on borrowed time. Either the plan will change or the service will degrade as the provider throttles to manage capacity.

Treat model providers like infrastructure, not software. Software can be updated overnight. Infrastructure has supply chains.

When a rate limit appears, don't read it as a policy choice. Read it as a signal about capacity. Something real is scarce. It might be GPUs. It might be grid allocation. It might be a twelve-year queue for a substation that someone ordered the day your lab was founded.

The honest version

The AI economy runs on hype and physics. The hype is what gets written up. The physics is what makes your request fail at 4 PM on peak-inference hours.

ThePrimeagen saw the hype cracking. Theo saw the silicon underneath it. Both are right.

The concrete is beneath the silicon. And nobody in the press-release business is talking about that part.

// series: The AI Skeptic(15 of 15)