Agent Token Profiler
Your agent re-sends the system prompt and every tool schema on every turn — a cost that's invisible until the bill arrives. Paste your setup below and see the per-turn breakdown, a projection across 100 turns, and where routing the easy turns to a cheaper model would save you money.
A typical Claude agent re-sends roughly 600–1,600 tokens of system prompt + tool schemas on every single turn — before the user even speaks — so cost scales with every message, not just with how much the user types.
New study → we measured what 13 real AI agents cost per turn
Token counts are an estimate (computed locally with a GPT BPE tokenizer; Anthropic's differs by a few percent). Prices are approximate — see the note by the table. This projects each turn's fixed overhead (system + schemas + tool I/O) × N; it does not model a growing transcript, so a long accumulating conversation will cost more — the point here is to surface that per-turn overhead and find what's bloating it.
Your agent setup
Per-turn token breakdown
Projected over 100 turns
| Model | Total tokens | Cost |
|---|
Pricing = approximate public list prices, mid-2026, blended input/output per 1M tokens.
Verify with the provider — edit the values in
lib/token-profiler/pricing.ts to match your real rates.
Bloat flags
The math, in the open
FAQ
How do I build a Claude agent from scratch?
A Claude agent is just a loop: send the conversation to the model, run any tool it calls, append the result, and repeat until it replies. That's about 150 lines on the official Anthropic SDK — no framework required. AgentLoop is a free, MIT-licensed starter that does exactly this, readable top to bottom.
Why is my AI agent so expensive?
Most agent cost is invisible. You re-send the system prompt and every tool schema on every single turn, so one verbose tool definition is billed again on turn 1, 2, 3 and on. A typical support agent quietly carries around 650 tokens of tool schemas per turn before the user even speaks.
How do I estimate Claude agent token cost?
Count what you re-send each turn — system prompt, all tool schemas, prior messages, and tool outputs — then multiply by your number of turns and the model's per-token price. The free Agent Token Profiler does this in your browser: paste your setup and see the per-turn breakdown and projected cost.
How do I reduce my AI agent's token cost?
Trim verbose tool schemas (the biggest hidden cost, since they are re-sent every turn), summarize chatty tool outputs before feeding them back, cap conversation history, and route the easy turns to a cheaper model like Claude Haiku. Measure first — the Token Profiler flags which tool is inflating your context.
Token metering is 1 of 8 production patterns in AgentLoop. The free MIT core is the readable agent loop you can build on; AgentLoop Pro wires up all eight — parallel tools, persistent memory, retries, rate limiting, approval gates, evals, token metering, and a multi-provider seam (so the routing above is one config change, not a rewrite).