How I Cut My Claude Code Token Usage by 60% and Got Better Output

This article details strategies for reducing token usage with Claude Code while improving output quality. The key is to treat token cost as a tangible factor and optimize prompts and sessions accordingly.

Key Strategies

Aggressively Reset Sessions: Avoid treating sessions as persistent notebooks. Start each session with a clear objective, relevant files, constraints, and expected output. This prevents the model from carrying unnecessary context.

objective: implement JWT refresh logic
relevant files: src/auth/session.ts, src/middleware/verify.ts
constraints: no new libraries, preserve existing error handling
what already failed: tried storing refresh tokens in memory
expected output: working refresh endpoint with test coverage

Use Specific Prompts: Avoid vague, conversational prompts. Treat prompts as instructions for industrial equipment. Specify the verb, scope, and constraints.

# Before
what do you think about the auth flow?

# After
Identify the security issue in the token validation logic. Return the vulnerable line and explain why it's exploitable. Under 150 words.

Separate Modes: Treat planning, debugging, and implementation as distinct phases. Use structured documents to seed implementation sessions selectively.

expected: webhook processes in under 500ms
actual: times out after 30s on payloads over 10KB
reproduction: any request with body > 10KB to /api/webhooks
recent changes: added payload validation in middleware, PR #47
logs: [paste relevant lines only]
suspected scope: likely the base64 encoding step in validatePayload()

Use Negative Prompts: Add explicit exclusion instructions to prompts to limit the scope of the model’s response.

do not redesign the architecture
do not explain basics
do not add dependencies
do not touch code outside the function I specified
do not rewrite working tests

Isolate Relevant Code: Avoid pasting entire files into the context. Isolate the relevant function, interface, or error message.
Set Answer Budgets: Constrain the model’s output by specifying the number of bullets, tokens, paragraphs, or format (e.g., patch only, no explanation).

Use Reusable Prompt Fragments: Systematize prompts by storing frequently used instructions in snippets. This reduces output variance and ensures consistency.

preserve all existing comments
minimal diff only — do not touch working code
TypeScript strict mode, no any
explain your reasoning before writing code
no new dependencies
mobile-first for any UI changes

Know When Not to Use Claude: Sometimes, directly fixing the problem is faster and more efficient than using AI.

The Importance of Attention

The core principle is to treat token cost as a visible factor. This reveals waste and encourages clarity in prompts and sessions. Good engineering is about compression: extracting signal, removing noise, and making decisions efficiently.