Token Economics & Cost
Tokens are the billing unit. Input tokens (prompt) and output tokens (response) are priced separately, with output usually 4–5x more expensive. Claude Sonnet 4-class models are around $3/M input, $15/M output; Opus has historically been ~5x Sonnet (verify current pricing on the Anthropic console — pricing has been adjusted multiple times). Costs add up fast in long agent loops.
Tokens are taxi meter clicks — input clicks every time you pull up to a stoplight (turn), output clicks while driving. Caching is the monthly transit pass — 90% off if you board before the pass expires (5 min).
Each turn re-sends the entire conversation as input — so a 50-turn agent run with 50K tokens of accumulated context pays for 50K input tokens × 50 turns = 2.5M input tokens before counting output. Mitigations: prompt caching (90% discount on cached prefix), batching, smaller models for routine work (Haiku for grep/file reads, Sonnet for design). Track cost per session; flag runaway loops where input grows without progress.
Prompt caching has two TTL options: 5-minute (default, ~25% write premium) and 1-hour (2x write premium) — pick by your pause pattern. Cache-aware design: stable system prompt + CLAUDE.md at the front (cached), volatile user content at the end (not cached). Reads are 90% off — break-even is roughly 2 uses on the 5-minute tier. For high-frequency agents (CI bots, scheduled tasks), cache hit rate is the dominant cost driver. The 1-hour TTL is worth it for long-running tasks (>5 min between turns) where you'd otherwise re-cache. Output tokens dominate when responses are long (code generation, explanations); choose terse output formatting in those cases.
Tokens are how LLMs are priced — input and output separately, output usually 4–5x more expensive. The non-obvious cost in agents is that every turn re-sends the full conversation, so a long loop with verbose tool output gets expensive quickly. Prompt caching is the biggest lever: a stable prefix (system prompt + CLAUDE.md) gets cached for 5 minutes at a 90% discount on reads. I structure agent workflows to keep cache hits high — stable prefix at the front, volatile user content at the end, no sleeps longer than 5 minutes mid-loop.
Ignoring prompt caching. Without it, a typical Claude Code session costs 3–5x more than it needs to. The cache is mostly automatic but breaks if you change the system prompt mid-session or sleep past 5 minutes.