·8m read time·1,580 words·

Your coding agent has no world model. You built it one.

Yann LeCun says the path to real intelligence runs through world models, not LLMs. He's probably right. And it explains exactly why your agent loop works.

In November 2025, Yann LeCun argued that the path to superintelligence probably doesn't run through large language models. Something fundamental, he says, is missing from the current approach. A few weeks later he left Meta after more than ten years. By March 2026 he had raised just over a billion dollars for a new company, AMI Labs, built around an idea that runs directly against almost everything the industry is currently spending money on.

It is tempting to file this under turf war. He left, he's raising against the thing his old employer bet the house on, of course he says it's a dead end.

Don't file it there.

Who is actually saying this

LeCun is not a man shouting from the cheap seats.

He shared the 2018 Turing Award, the closest thing computing has to a Nobel, with Geoffrey Hinton and Yoshua Bengio. In the late 1980s and 1990s he did the foundational work on convolutional neural networks. That is the architecture that learned to read the handwritten digits on bank cheques, and it made the modern deep-learning revolution in computer vision possible. Virtually every image recognition system in use today builds on ideas that came out of that work, from the face unlock on your phone to the model that reads a tumour off a scan. He founded Facebook AI Research in 2013 and ran Meta's AI science for a decade.

When this person says LLMs on their own are probably not enough for real general intelligence, it is worth understanding what he means before deciding he's wrong.

The difference, explained like you're sixteen

Here is the whole disagreement, stripped down.

A large language model is trained by predicting the next word. That, LeCun argues, also explains its fundamental limitation. It has read an enormous amount of text and learned, with uncanny skill, which words tend to follow which other words. Ask it anything and it generates a reply one token at a time, each token chosen because it's the statistically likely thing to come next. I've written about this before: it's a brilliant parrot, a system that produces beautiful sentences without any picture of what the sentence is about.

Think of someone who has read every book ever written but has never once left the room. They can talk fluently about gravity. They have never watched anything fall.

A world model is the other kind of knowing.

A two-year-old has read zero books. But push a cup toward the edge of a table and that toddler already knows, somewhere in their body, that it's going to drop and smash. They didn't learn that from a sentence. They learned it by watching the world, thousands of times, until they had an internal sense of how things behave. Roll a ball behind a couch and they look for it to come out the other side. They have a model of the world running in their head, and they use it to predict what happens next and to plan what to do about it.

That is what LeCun wants to build. Not a system that predicts the next word, but a system that predicts what the world will do.

What JEPA actually does

His team's approach is called JEPA, Joint Embedding Predictive Architecture, and the video version, V-JEPA, was trained on over a million hours of internet video.

The clever part is what it refuses to do. An obvious way to teach a model about the world is to make it predict the next video frame, pixel by pixel. That fails, because most of the fine detail is genuinely unpredictable. You cannot guess exactly where every leaf will blow or how the light will flicker, and trying to forces the model to waste everything on noise.

So JEPA predicts at the level of the gist instead. It watches a clip, hides part of it, and predicts an abstract summary of the missing part: "there's a cup, it's near the edge, it's moving." Not the exact pixels. The meaning. It learns to keep the part of the future that matters and throw away the part that's just static.

That sounds like a small engineering choice. It isn't. A system that can predict the gist of what happens next, LeCun argues, gets you closer to something current LLMs struggle with: explicitly reasoning about possible future states of the world. It can run "if I do this, the world becomes that" forward in its head, compare the outcome to a goal, and choose an action. That's closer to planning. Modern LLMs plan surprisingly well within a task, but they still lack an explicit model of how the world behaves.

There's a hard mathematical edge to this. LeCun likes to point out that if a model generates blind, one step at a time, with even a 1% chance of going wrong at each step and no way to check itself, then a hundred steps in a row only has about a 37% chance of being entirely correct. Errors compound. A system that can't look at where it's going will, often enough, walk off the road.

This is the parrot problem written in formal notation. No internal picture, no plan, no self-check.

The part nobody connects to coding

Here's where it gets interesting for anyone who spends their day in an agent loop.

LeCun is describing a hole: the LLM has no world model. And the entire practice of agentic coding is, whether we say it out loud or not, a way of filling that hole from the outside.

Think about what actually makes Claude Code or any decent coding agent work. It isn't that the model finally grew an internal understanding of your system. It's the harness around it. The compiler that rejects nonsense. The type checker that says "that function doesn't exist." The test suite that goes red. The file system it can read back. The loop that feeds every error message straight back into the next attempt.

You can think of that harness as a world model. It's just sitting outside the weights instead of inside them.

The model proposes an action, the world (your codebase, your compiler, your tests) reacts, and the model gets to see the reaction and adjust. That's the toddler-and-the-cup loop, reconstructed out of CI and stack traces. We didn't wait for the model to learn how our software behaves. We built an environment that shows it, every single step.

This is why raw ChatGPT pasted into a textbox in 2023 felt like a clever party trick, and an agent wired into a real toolchain in 2025 felt like a colleague. Same kind of model underneath. The difference is the prosthetic world model we wrapped around it.

Where the prosthetic ends

Once you see agentic coding this way, you can predict exactly where your agent is sharp and where it goes blind.

It is sharp wherever the world model is cheap to externalise. Strongly typed code, a fast test suite, a tight compile loop, clear runtime errors. In that environment the agent gets dense, honest feedback on every move, and it's genuinely good. The world tells it when it's wrong.

It goes blind wherever you can't externalise the world model. "Make this feel nicer." "Is this the right abstraction." "Will users understand this." Untested glue code where nothing goes red when it breaks. Product intent that lives only in your head. There's no compiler for taste. The agent walks off the road, confidently, because nothing in its environment pushed back. This is the same reason the prompt is not the spec: the structure has to come from somewhere, and right now that somewhere is you.

So the practical advice writes itself. Stop hoping the model will grow judgement and start building the external world model on purpose. Tests aren't just for catching regressions anymore, they're the sense organs your agent uses to feel the codebase. Types aren't just documentation, they're guardrails the agent can't drive through. A tight feedback loop isn't a nice-to-have, it's the difference between an agent that course-corrects and one that compounds its own errors toward that 37%.

What happens if LeCun is right

If LeCun is right and world models turn out to be an essential missing ingredient, the thing we're currently bolting on from the outside starts moving inside. A model that can actually plan, imagine consequences, and check itself against an internal sense of how systems behave needs a lot less scaffolding. The harness gets thinner. The agent stops needing you to be its eyes.

If the bet doesn't pay off, or pays off slowly, nothing about your job changes. You stay the world model. You keep supplying the structure, the tests, the judgement, the picture of where the code is supposed to go that the model simply does not have.

Either way, the move today is the same, and it's the unglamorous one: build the environment that gives the model honest feedback, because that environment is doing the cognitive work the model can't.

LeCun isn't predicting the death of your coding agent. He's explaining, more precisely than anyone selling you the agent will, the scaffolding you're already standing on.

The parrot still can't see where it's going. You've just learned to build it a pair of eyes out of your test suite.

Sources: MIT Technology Review on AMI Labs, Meta AI on V-JEPA 2, The Decoder on LeCun's exit.