How to Reduce Claude Code Token Usage — Skills That Cut Costs (2026)
Claude Code burns through tokens fast. These skills and techniques cut token usage by up to 65% without sacrificing output quality. Save money, code more.
Claude Code charges per token. A verbose agent that loves to explain itself burns through your usage limits fast. "Certainly! I'd be happy to help you refactor that function. Let me walk you through the changes step by step..." — that preamble costs money and adds nothing.
Token optimization is becoming the most practical skill in the Claude Code ecosystem. Here's what actually works.
The Caveman approach
The most talked-about token optimization skill right now is Caveman. It makes Claude communicate in terse, direct language. No filler, no pleasantries, no step-by-step explanations you didn't ask for.
The before and after difference is dramatic:
Without Caveman: "I've successfully completed the refactoring of the authentication module. The changes include updating the token validation logic to handle edge cases more gracefully, adding appropriate error handling, and ensuring backwards compatibility with the existing API contracts."
With Caveman: "Auth module refactored. Token validation handles edge cases. Error handling added. Backwards compatible."
Same information. Roughly 65% fewer tokens. Over a full work session, this adds up to significant savings — both in cost and in context window usage.
The concept is simple: a SKILL.md that tells Claude to be concise. You can write your own in 10 lines:
---
name: concise-output
description: Reduces token usage by producing concise responses. Always active.
---
# Output Rules
- No filler phrases. No "certainly", "I'd be happy to", "let me explain"
- No step-by-step narration unless explicitly asked
- Code changes: show the diff, not the explanation
- One sentence summaries, not paragraphs
- Skip confirmations. Just do the work.
Drop this in ~/.claude/skills/concise-output/SKILL.md and your token usage drops immediately.
Context window management
Tokens aren't just about cost — they're about context window limits. Claude Code has a finite context window, and every token of fluff pushes out useful context. When your context window fills up, Claude loses track of earlier parts of the conversation.
Skills that reduce output verbosity keep more room for actual code context.
The /clear command. When Claude Code tells you "X tokens used," check if you're approaching limits. Use /clear to reset or let compaction handle it. Claude Code now shows a hint when you should clear.
Incremental requests. Instead of "refactor the entire auth module," say "refactor the login function in auth.ts." Smaller scope means less context needed, fewer tokens consumed, and more focused output.
The /recap command. New in April 2026. When you return to a session, /recap gives you a summary of where you left off without replaying the entire conversation. This saves tokens on session resumption.
Skills that save tokens indirectly
Some skills reduce token usage not through output formatting, but by getting things right the first time:
Code review skills. A skill that catches bugs before you commit means fewer "fix the bug I just introduced" conversations. Each bug-fix round trip consumes tokens.
Testing skills. Tests that work on the first generation don't need "the test fails, fix it" follow-ups. A testing skill that knows your framework prevents false starts.
Architecture skills. A skill that knows your project conventions prevents Claude from generating code in the wrong pattern, which you then have to ask it to redo.
Browse skills that improve first-pass accuracy at Agensi.
The effort frontmatter trick
Claude Code recently added effort frontmatter support for skills. You can set the model's effort level when a skill is invoked:
---
name: quick-review
description: Fast code review for small changes
effort: low
---
Lower effort means fewer tokens spent on reasoning. For routine tasks like formatting, linting, or simple reviews, effort: low can cut token usage substantially without noticeable quality loss.
Use effort: high only for complex tasks where deep reasoning matters — architecture decisions, security audits, complex refactoring.
Practical token budget strategies
Set a daily target. Track your usage for a week, then set a target 30% lower. Install a concise output skill and measure the difference.
Batch related tasks. Instead of five separate conversations about five endpoints, handle them in one session where Claude already has the context loaded. Context reuse saves input tokens.
Use MCP for skill loading. With Agensi Pro, your agent loads skills on demand via MCP. This means skills aren't loaded into context at session start — they're only loaded when needed, saving input tokens on sessions where specific skills aren't relevant.
Be specific in prompts. "Fix the bug" forces Claude to search your codebase and explain what it found. "In src/routes/users.ts line 42, the null check is wrong" gets straight to the fix. Fewer exploration tokens, more action tokens.
Monitoring your usage
Claude Code now shows rate limit usage in the status line. Check your 5-hour and 7-day windows to understand your consumption patterns. If you consistently hit limits in the afternoon, your morning sessions might be too verbose.
The /doctor command also shows diagnostic information about your setup, including whether prompt caching is enabled. Prompt caching (via ENABLE_PROMPT_CACHING_1H) can dramatically reduce input token costs for long sessions by caching repeated context.
Find skills that improve output quality and reduce rework at Agensi.
Find the right skill for your workflow
Browse our marketplace of AI agent skills, ready to install in seconds.
Browse Skills