Skip to content

How I Cut My Claude Code Token Usage by 60% and Got Better Output

Published: at 08:53 AM
(Aeon Flex)

This article details strategies for reducing token usage with Claude Code while improving output quality. The key is to treat token cost as a tangible factor and optimize prompts and sessions accordingly.

Key Strategies

  • Aggressively Reset Sessions: Avoid treating sessions as persistent notebooks. Start each session with a clear objective, relevant files, constraints, and expected output. This prevents the model from carrying unnecessary context.

    objective: implement JWT refresh logic
    relevant files: src/auth/session.ts, src/middleware/verify.ts
    constraints: no new libraries, preserve existing error handling
    what already failed: tried storing refresh tokens in memory
    expected output: working refresh endpoint with test coverage
  • Use Specific Prompts: Avoid vague, conversational prompts. Treat prompts as instructions for industrial equipment. Specify the verb, scope, and constraints.

    # Before
    what do you think about the auth flow?
    
    # After
    Identify the security issue in the token validation logic. Return the vulnerable line and explain why it's exploitable. Under 150 words.
  • Separate Modes: Treat planning, debugging, and implementation as distinct phases. Use structured documents to seed implementation sessions selectively.

    expected: webhook processes in under 500ms
    actual: times out after 30s on payloads over 10KB
    reproduction: any request with body > 10KB to /api/webhooks
    recent changes: added payload validation in middleware, PR #47
    logs: [paste relevant lines only]
    suspected scope: likely the base64 encoding step in validatePayload()
  • Use Negative Prompts: Add explicit exclusion instructions to prompts to limit the scope of the model’s response.

    do not redesign the architecture
    do not explain basics
    do not add dependencies
    do not touch code outside the function I specified
    do not rewrite working tests
  • Isolate Relevant Code: Avoid pasting entire files into the context. Isolate the relevant function, interface, or error message.

  • Set Answer Budgets: Constrain the model’s output by specifying the number of bullets, tokens, paragraphs, or format (e.g., patch only, no explanation).

  • Use Reusable Prompt Fragments: Systematize prompts by storing frequently used instructions in snippets. This reduces output variance and ensures consistency.

    preserve all existing comments
    minimal diff only — do not touch working code
    TypeScript strict mode, no any
    explain your reasoning before writing code
    no new dependencies
    mobile-first for any UI changes
  • Know When Not to Use Claude: Sometimes, directly fixing the problem is faster and more efficient than using AI.

The Importance of Attention

The core principle is to treat token cost as a visible factor. This reveals waste and encourages clarity in prompts and sessions. Good engineering is about compression: extracting signal, removing noise, and making decisions efficiently.

Read the original article