This article details strategies for reducing token usage with Claude Code while improving output quality. The key is to treat token cost as a tangible factor and optimize prompts and sessions accordingly.
Key Strategies
-
Aggressively Reset Sessions: Avoid treating sessions as persistent notebooks. Start each session with a clear objective, relevant files, constraints, and expected output. This prevents the model from carrying unnecessary context.
objective: implement JWT refresh logic relevant files: src/auth/session.ts, src/middleware/verify.ts constraints: no new libraries, preserve existing error handling what already failed: tried storing refresh tokens in memory expected output: working refresh endpoint with test coverage -
Use Specific Prompts: Avoid vague, conversational prompts. Treat prompts as instructions for industrial equipment. Specify the verb, scope, and constraints.
# Before what do you think about the auth flow? # After Identify the security issue in the token validation logic. Return the vulnerable line and explain why it's exploitable. Under 150 words. -
Separate Modes: Treat planning, debugging, and implementation as distinct phases. Use structured documents to seed implementation sessions selectively.
expected: webhook processes in under 500ms actual: times out after 30s on payloads over 10KB reproduction: any request with body > 10KB to /api/webhooks recent changes: added payload validation in middleware, PR #47 logs: [paste relevant lines only] suspected scope: likely the base64 encoding step in validatePayload() -
Use Negative Prompts: Add explicit exclusion instructions to prompts to limit the scope of the model’s response.
do not redesign the architecture do not explain basics do not add dependencies do not touch code outside the function I specified do not rewrite working tests -
Isolate Relevant Code: Avoid pasting entire files into the context. Isolate the relevant function, interface, or error message.
-
Set Answer Budgets: Constrain the model’s output by specifying the number of bullets, tokens, paragraphs, or format (e.g., patch only, no explanation).
-
Use Reusable Prompt Fragments: Systematize prompts by storing frequently used instructions in snippets. This reduces output variance and ensures consistency.
preserve all existing comments minimal diff only — do not touch working code TypeScript strict mode, no any explain your reasoning before writing code no new dependencies mobile-first for any UI changes -
Know When Not to Use Claude: Sometimes, directly fixing the problem is faster and more efficient than using AI.
The Importance of Attention
The core principle is to treat token cost as a visible factor. This reveals waste and encourages clarity in prompts and sessions. Good engineering is about compression: extracting signal, removing noise, and making decisions efficiently.