One Claude Code plugin has 63,000 stars. Another has 15,000. The one with four times the audience asks you to talk like a caveman. The other one sandboxes your tool output behind FTS5 indexes. The internet picked the funny one.
That's fine. The internet is allowed to pick the funny one. The question is whether you should.
I've been running context-mode for weeks. I keep watching Caveman trend. They both promise to fix the same problem: agents burn through context windows faster than your patience. They go about it in completely different ways. And the gap between them is wider than the star counts suggest.
What each one actually does
Caveman is a system-prompt skill. It tells the agent to drop articles, contractions, filler and "I'd be happy to help" preamble. Three intensity levels (lite, full, ultra), one mode that writes in classical Chinese (Wenyan) for the truly token-pilled. The repo lives at JuliusBrussee/caveman. Vendor claim: 65% output token reduction across a ten-prompt benchmark.
context-mode is an MCP-protocol layer interceptor. When the agent runs a tool that would normally dump 56KB of Playwright snapshot or 11KB of git log into context, context-mode routes it into a sandboxed subprocess. The raw data lives there. Only what you printed (or what you searched for via FTS5 with BM25 ranking) enters your conversation. The repo lives at mksglu/context-mode. Claim: 94–100% reduction on tool outputs.
So far, fine. Different angles, both real. Let's look at the angles more carefully.
The math everyone is skipping
A Claude Code session has several token sources, and they are not equal in size.
| Source | Typical share |
|---|---|
| File reads, tool outputs, web fetches, snapshots | Largest |
| System prompt + CLAUDE.md + MCP tool descriptions | Medium |
| User prompts | Small |
| Model output | Smallest |
Caveman compresses the bottom row. It does it well. The vendor's own benchmark says 65%, and they're honest enough to flag in an [!IMPORTANT] callout that thinking tokens and input tokens are untouched. Independent community benchmarks land at 30–50%, with a one-line "be brief" prompt capturing most of the savings on its own.
context-mode compresses the top row. The thing that actually fills your context. The 986KB of repo research that becomes 62KB. The 56KB Playwright snapshot that becomes 299 bytes. That's not a benchmark trick. That's just what happens when the raw bytes never enter the conversation in the first place.
This is the part nobody is putting next to each other: output is grams, tool output is kilos. Compressing the smallest token source by 65% is genuinely useful. Compressing the largest token source by 98% is a different category of intervention.
The tell hiding inside Caveman
Here's the detail that should make you stop and look.
The serious, non-meme tools in the Caveman ecosystem aren't about output. They're about input.
caveman-compressrewrites your CLAUDE.md and memory files into caveman-speak. It claims ~46% input token savings per session start. This is the runtime mode's quiet admission that input is where the money lives.caveman-shrinkis an MCP middleware that compresses tool descriptions. Same admission, different layer.cavecrew-*subagents are tuned to emit ~60% fewer tokens. Output again, but at the subagent boundary, which is where main-context pressure actually leaks.
Nothing wrong with any of this. The point is: the moment Caveman gets serious about saving real money, it stops being about the model's mouth and starts being about the room the model lives in. Which is exactly the room context-mode was already worrying about.
Steelman, before I pick a side
Caveman has real virtues. The install is one line. It auto-detects 30+ agents. It works without runtime infrastructure: no Node 22.5, no FTS5, no MCP server, no hook lifecycle to debug. It compresses the part of the output you actually read, so it makes your session more pleasant, not just cheaper. And the meme spreads. Memes are how good ideas travel. The brilliant parrot doesn't need to be entertaining to be right, but it helps.
context-mode has real friction. Native sqlite, Node version constraints, hook support that varies across platforms (Antigravity and Zed get no hooks, Codex gets partial coverage). When it works, it works invisibly. When it doesn't, you're debugging an MCP transport layer at 9pm.
If you want the cheap, fun, drop-in win, Caveman lite is great. Honestly. Run it. Tell me it doesn't make your sessions feel lighter.
Where they collide
Once you stack Caveman's serious tools (caveman-compress + caveman-shrink + cavecrew), you've stopped doing the funny part. You're running input-side compression on memory files, tool descriptions, and subagent boundaries. At that point you're solving the same problem context-mode solves at the protocol layer, with less complete tools.
The MCP-layer interception in context-mode catches everything once, structurally. The Caveman approach catches things piecewise, behaviourally. Behavioural fixes ask the model to behave. Structural fixes build a wall so the model can't flood itself. One of these is something you trust. The other is something you hope.
This is the same pattern I keep writing about. Asking your agent nicely doesn't scale. Lint rules over polite suggestions don't scale either, as I argued in your agent's suffering is your tech debt speaking. The discipline lives in the system, not the request.
What I'd actually do
Run both. They're not really competing.
- Use context-mode for the structural problem: tool outputs, file analysis, web fetches, MCP results. The raw data should never enter your context window. Period.
- Use Caveman lite (not full, not ultra) for the cosmetic problem: prose you actually have to read. Reading "Sure! I'd be happy to help you with that" twenty times a day is its own tax, and it's a tax on you, not on the API.
- Run
caveman-compressonce on your CLAUDE.md. One-time investment, pays every session. - Skip
caveman-shrinkif you're running context-mode, since MCP tool descriptions are already a known leak that context-mode addresses more cleanly.
Or, if you have to pick: pick the wall, not the manners.
Closing
The internet rewarded the funny one. That's almost always how this goes. Caveman's 63k stars are a referendum on personality, not architecture. It's a great piece of software design and a brilliant piece of meme engineering, which is rare and deserves respect.
But star counts measure adoption velocity, not problem-solving depth. The plugin that fixes the smallest leak with a meme will always out-trend the plugin that fixes the biggest leak with a sandbox. That doesn't make the meme wrong. It means the room matters more than the mouth.
Pick the architecture. The vibes are extra.