What is Caveman Mode (Token Compression) in AI Coding Agents?

Ecosystem & R&D Skillsmedium

Caveman Mode (Token Compression)

Caveman is a community Claude Code plugin that compresses model output by dropping articles, filler, and hedging while preserving technical content exactly. Activated via /caveman with intensity levels (lite, full, ultra, wenyan-*). Typical output token reduction: ~75% on prose-heavy responses.

Memory anchor

Caveman is voice compression — same meaning, fewer syllables. Like a doctor's chart: 'pt c/o SOB, hx HTN' replaces three sentences. Lossless to the trained reader, gibberish to outsiders.

Expected depth

Two parts: (a) a persistent mode that rewrites the model's output style across turns until 'stop caveman' is said, and (b) a /caveman:compress skill that compresses static memory files (CLAUDE.md, todos) into caveman form, saving input tokens on every cached prefix turn. Auto-clarity rule: caveman speech is dropped for security warnings, irreversible operations, and multi-step sequences where ambiguity would cause misreads — full prose resumes for those, then caveman resumes after.

Deep — senior internals

The mode is a SessionStart/UserPromptSubmit hook that injects style instructions into the system prompt. The compress skill spawns Claude on prose files, validates the output with a round-trip check, retries up to 2x on failure, and leaves the original untouched if validation fails — backup saved as <file>.original.md. Pairs with prompt caching: a compressed CLAUDE.md sits in the cached prefix at lower token cost forever after. Trade-off: heavy levels (ultra/wenyan-ultra) sacrifice readability for non-domain readers; reserve for solo workflows where you want raw signal.

🎤Interview-ready answer

Caveman is a community Claude Code plugin I use to cut output tokens or compress static prompt context. It has two parts: a persistent mode that compresses replies in real time, and a /caveman:compress skill that rewrites memory files like CLAUDE.md into caveman form. Output reduction is around 75% on prose-heavy responses. The mode auto-disables for security warnings and irreversible operations to avoid ambiguity, then resumes after. I pair it with prompt caching — a compressed CLAUDE.md is cheaper on every cached turn forever.

⚠Common trap

Expecting 50%+ compression on already-terse, code-reference-heavy markdown. Caveman is conservative on technical content — it preserves paths, commands, and code blocks exactly — so on tight files the win is often <10%. Asymmetric returns: works best on prose-heavy text.