Six days ago I told you to install Caveman lite on top of context-mode, run caveman-compress over your CLAUDE.md, and call it a stack. I'm walking the Caveman half of that back.
The "run both" framing was wrong. The math is the reason. Let me show the math.
What the install actually costs
Every Claude Code session starts before you've typed anything. The system prompt loads. CLAUDE.md loads. MCP tool descriptions register. Skills auto-discover. Hook scripts wire themselves in. None of it has done any work for you yet. All of it is sitting in your context window, ticking against the budget.
Here is what the stacked install of Caveman plus context-mode plus a healthy CLAUDE.md looks like at session start, from one of my own sessions:
| Surface | ~Tokens |
|---|---|
| 13 SKILL.md files (auto-discovered) | ~7,100 |
| Routing-block / sessionstart injection | ~1,020 |
| MCP tool descriptions (11 tools) | ~3,000 |
| Pre-work overhead | ~11,000 |
Your numbers will differ. The shape won't. Eleven thousand tokens before you've asked the model to do anything. That is the entry fee.
The break-even no one wants to do
Caveman pitches 65% reduction on output tokens. Independent benchmarks land closer to 30-50%, with a one-line "be brief" prompt capturing most of the saving on its own. The previous post said all of that. That part still holds.
Now stack the costs.
- Output savings are realised per response, on the smallest token category in the session.
- The install tax is paid per session, before any response exists.
For short sessions, you never recoup the install. For long, chatty sessions, you might. For the typical day where you open Claude, ask three questions, close it, open it again two hours later for something unrelated, you eat the tax six times and save it back maybe twice. Net loss.
The 65% number hides this when it's framed as a saving. Of course the model emits fewer output tokens when you tell it to talk like a caveman. The model also has to read a 7K skill bundle to learn how to talk like a caveman in the first place. That bundle loads whether you use the skill or not.
context-mode survives this critique
I want to be precise about which half of the recommendation is staying.
context-mode also has install cost. Its MCP tool descriptions sit in the same auto-loaded surface. But the comparison goes:
- context-mode's tax: a flat fee for tool descriptions, paid once per session.
- context-mode's payback: a single 56KB Playwright snapshot, or 11KB git log, or 986KB repo-research subagent dump, gets routed into the sandbox and never enters your context window. One avoided dump pays the install fee several times over.
If you do any meaningful tool work in a session, context-mode's tokenomics close cleanly. The saving lives on the largest token category. The install fee is rounding error against the leak it prevents.
Caveman's tokenomics close only if you talk to the model a lot in a single session, never restart, and read every word of every reply. That is not most people's workflow.
The pattern: auto-discovery is a hidden cost
Here is what the Caveman case made me notice about the whole plugin ecosystem.
Anything that auto-loads into your system prompt pays itself first. Skill discovery, MCP tool registration, hook injection, sessionstart payloads, plugin manifests. All silent. None of it shows up in the model's response. All of it is in the context window, and it was there before you typed.
The cost is guaranteed and front-loaded. The savings are conditional and back-loaded.
This isn't an argument against tooling. It's an argument against stacking tooling. Each plugin you add looks free at install time and costs forever at runtime. The right question is never "is this plugin useful." The right question is "is this plugin useful enough to justify being in every session forever, including the ones where I won't touch it."
For Caveman, my answer changed. The output saving on the conversational layer is not worth being in the room for every session, including the short ones where I just want to ask one question and leave.
What I'd actually recommend now
- Drop Caveman. All of it. Lite, full, ultra, compress, shrink, the whole crew. The output savings do not survive the install fee for normal session shapes.
- Keep context-mode if you do tool-heavy work. Web fetches, Playwright runs, large grep dumps, big file reads, MCP-heavy days. If those show up regularly, the sandbox pays itself back fast. If they don't, drop it too.
- Trim CLAUDE.md ruthlessly. Every line in there is in every session. Treat it like the bill it is.
- Audit your skills. Count what auto-discovers. If you wouldn't pay for a skill at every session start, it shouldn't be auto-discoverable. Move it behind explicit invocation.
- Compress when you measure a leak, not pre-emptively. Reach for tooling to fix problems you can show, not problems you've read about.
The general principle: lean defaults beat clever plugins. The cheapest way to save tokens is to not load tokens you don't need.
This is the same lesson I keep running into. Asking your agent nicely doesn't scale. Adding a plugin to ask the agent nicely for you doesn't scale either, because the plugin itself has to be in the prompt to do its asking.
The central irony
The thing that was supposed to save context is the first thing eating it. That is the part I missed in the original post. I was so focused on what context-mode does well, and what Caveman saves at the output layer, that I never added up what the stack costs at the door.
Eleven thousand tokens of "I haven't done anything yet" is the kind of number that should make anyone selling a token saver pause before publishing.
The lesson generalises. The leak you can see is rarely where the cost actually is. With plugin-driven AI tooling, the bill arrives before the work does. Look at the receipt at session start, not at the savings on the output you read.
Pick the architecture, sure. But add up the architecture first.