How to Reduce Claude Code Token Usage: 8 Proven Methods (...

Claude Code charges per token. A verbose agent that loves to explain itself burns through your usage limits fast. "Certainly! I'd be happy to help you refactor that function. Let me walk you through the changes step by step..." — that preamble costs money and adds nothing.

Token optimization is becoming the most practical skill in the Claude Code ecosystem. Here's what actually works.

Quick Answer: The Caveman skill reduces tokens by making Claude communicate in terse, direct language, eliminating filler and unnecessary explanations while retaining the same information and saving approximately 65% of tokens.

How does the Caveman skill reduce tokens?

The most talked-about token optimization skill right now is Caveman. It makes Claude communicate in terse, direct language. No filler, no pleasantries, no step-by-step explanations you didn't ask for.

The before and after difference is dramatic:

Without Caveman: "I've successfully completed the refactoring of the authentication module. The changes include updating the token validation logic to handle edge cases more gracefully, adding appropriate error handling, and ensuring backwards compatibility with the existing API contracts."

With Caveman: "Auth module refactored. Token validation handles edge cases. Error handling added. Backwards compatible."

Same information. Roughly 65% fewer tokens. Over a full work session, this adds up to significant savings — both in cost and in context window usage.

The concept is simple: a SKILL.md that tells Claude to be concise. You can write your own in 10 lines:

---
name: concise-output
description: Reduces token usage by producing concise responses. Always active.
---

# Output Rules

- No filler phrases. No "certainly", "I'd be happy to", "let me explain"
- No step-by-step narration unless explicitly asked
- Code changes: show the diff, not the explanation
- One sentence summaries, not paragraphs
- Skip confirmations. Just do the work.

Drop this in ~/.claude/skills/concise-output/SKILL.md and your token usage drops immediately.

Recommended skills

skill-router-2

by Shippers · 5

Automatically detect, load, and stack the perfect skills combo for any user requ…

endless-loop

by Kevin Cline · 5

Autonomous research and task loop that builds on previous findings to solve comp…

$12

seo-optimizer

by AI Agent Store · 108

SEO optimizer and banned-word scanner for Chinese social media. Keyword optimiza…

Free

Browse all Claude Code skills

How do I manage Claude Code's context window?

Tokens aren't just about cost — they're about context window limits. Claude Code has a finite context window, and every token of fluff pushes out useful context. When your context window fills up, Claude loses track of earlier parts of the conversation.

Skills that reduce output verbosity keep more room for actual code context.

The /clear command. When Claude Code tells you "X tokens used," check if you're approaching limits. Use /clear to reset or let compaction handle it. Claude Code now shows a hint when you should clear.

Incremental requests. Instead of "refactor the entire auth module," say "refactor the login function in auth.ts." Smaller scope means less context needed, fewer tokens consumed, and more focused output.

The /recap command. New in April 2026. When you return to a session, /recap gives you a summary of where you left off without replaying the entire conversation. This saves tokens on session resumption.

Which skills reduce token usage indirectly?

Some skills reduce token usage not through output formatting, but by getting things right the first time:

Code review skills. A skill that catches bugs before you commit means fewer "fix the bug I just introduced" conversations. Each bug-fix round trip consumes tokens.

Testing skills. Tests that work on the first generation don't need "the test fails, fix it" follow-ups. A testing skill that knows your framework prevents false starts.

Architecture skills. A skill that knows your project conventions prevents Claude from generating code in the wrong pattern, which you then have to ask it to redo.

Browse skills that improve first-pass accuracy at Agensi.

How does the effort frontmatter reduce tokens?

Claude Code recently added effort frontmatter support for skills. You can set the model's effort level when a skill is invoked:

---
name: quick-review
description: Fast code review for small changes
effort: low
---

Lower effort means fewer tokens spent on reasoning. For routine tasks like formatting, linting, or simple reviews, effort: low can cut token usage substantially without noticeable quality loss.

Use effort: high only for complex tasks where deep reasoning matters — architecture decisions, security audits, complex refactoring.

What are practical token budget strategies?

Set a daily target. Track your usage for a week, then set a target 30% lower. Install a concise output skill and measure the difference.

Batch related tasks. Instead of five separate conversations about five endpoints, handle them in one session where Claude already has the context loaded. Context reuse saves input tokens.

Install only what you need. Skills loaded into ~/.claude/skills/ are read by Claude Code at session start. Keep the directory focused — install via Agensi's one-liner curl command only the skills you actively use, and remove ones you no longer need. Fewer loaded skills means lower input token cost per session.

Be specific in prompts. "Fix the bug" forces Claude to search your codebase and explain what it found. "In src/routes/users.ts line 42, the null check is wrong" gets straight to the fix. Fewer exploration tokens, more action tokens.

How do I monitor Claude Code token usage?

Claude Code now shows rate limit usage in the status line. Check your 5-hour and 7-day windows to understand your consumption patterns. If you consistently hit limits in the afternoon, your morning sessions might be too verbose.

The /doctor command also shows diagnostic information about your setup, including whether prompt caching is enabled. Prompt caching (via ENABLE_PROMPT_CACHING_1H) can dramatically reduce input token costs for long sessions by caching repeated context.

Find skills that improve output quality and reduce rework at Agensi.

How to Reduce Claude Code Token Usage: Skills That Cut Costs (2026)

How does the Caveman skill reduce tokens?

Recommended skills

skill-router-2

endless-loop

seo-optimizer

How do I manage Claude Code's context window?

Which skills reduce token usage indirectly?

How does the effort frontmatter reduce tokens?

What are practical token budget strategies?

How do I monitor Claude Code token usage?

Keep reading

Frequently Asked Questions

How does the Caveman skill reduce tokens?

Recommended skills

skill-router-2

endless-loop

seo-optimizer

How do I manage Claude Code's context window?

Which skills reduce token usage indirectly?

How does the effort frontmatter reduce tokens?

What are practical token budget strategies?

How do I monitor Claude Code token usage?

Keep reading

Frequently Asked Questions

How can I reduce Claude Code token usage effectively?

What is the Caveman approach in Claude Code?

How does context window management save tokens?

Which skills indirectly reduce token usage?

How does the 'effort' frontmatter impact token usage?

What are practical strategies for managing token budgets?

How can I monitor my Claude Code token consumption?