The Hidden Token Tax: What Real AI Agents Actually Cost Per Turn
GitHub's MCP server re-sends 3,546 tokens of tool schemas on every single turn — that's $12.89 per 1,000 turns on Claude Sonnet, billed before the model reads a word of the user's question. We measured 13 real open-source AI agents (79 tools total): the median agent's invisible per-turn overhead is 9.6× larger than the user's actual request.
The thesis
An AI agent re-sends its entire system prompt plus every tool/function schema on every single turn. That fixed payload is billed as input tokens on each request — invisibly — until the bill arrives. We measured it across 13 real open-source projects (79 tools total).
Top findings
Six things the data makes hard to argue with.
Of that, $10.64 is pure schema input; the 150-token reply is the cheap part. 26 tools is all it takes to make the plumbing cost more than the work.
Across all 13: min 101 / median 547 / max 3,546 tokens.
It's sequentialthinking from the official MCP server: a 565-token
natural-language description wrapped around a tiny 9-field schema. A single verbose tool
can out-weigh a whole toolbox.
GitHub's 3,546-token overhead is 35.1× the AWS-KB server's 101 — $12.89 vs $2.55 per 1k turns. Same pricing, same tokenizer. Bloat is a design choice, not a tax of nature.
Ten tools ≈ a fixed ~1,000-token tax per turn before anything happens.
The identical fetch tool is 236 tokens compact vs 288 pretty-printed — a 22% swing from
whitespace alone. Pydantic auto-adds a title to every field;
zod-to-json-schema appends $schema and additionalProperties —
invisible tokens nobody wrote, billed every turn.
Measure your own agent
Every number here is an estimate from the same method as the free Agent Token Profiler — measure YOUR agent's per-turn cost in your browser (no signup, no key):
Open the free Agent Token Profiler →The dataset
13 real open-source agents, 79 tools. Overhead / turn is the fixed payload re-sent on every request; $/1k turns projects that across 1,000 turns plus a modeled 150-token reply. See the per-server MCP token cost reference for each server's exact numbers.
| Agent | Type | System tok | Schema tok | #Tools | Overhead / turn | Fattest single tool (tok) | $/1k turns Sonnet | $/1k turns Haiku |
|---|---|---|---|---|---|---|---|---|
| GitHub MCP server | MCP server (tool provider) | 0 | 3,546 | 26 | 3,546 | create_pull_request_review (360) |
$12.89 | $4.30 |
| bolt.new | prompt-embedded coding agent | 3,118 | 0 | 0 (in-prose) | 3,118 | — (tools described in prompt) | $11.60 | $3.87 |
| GitLab MCP server | MCP server (tool provider) | 0 | 1,194 | 9 | 1,194 | push_files (175) |
$5.83 | $1.94 |
| Git MCP server | MCP server (tool provider) | 0 | 1,117 | 12 | 1,117 | git_log (261) |
$5.60 | $1.87 |
| Sequential Thinking MCP server | MCP server | 0 | 827 | 1 | 827 | sequentialthinking (827) |
$4.73 | $1.58 |
| Slack MCP server | MCP server (tool provider) | 0 | 679 | 8 | 679 | slack_reply_to_thread (124) |
$4.29 | $1.43 |
| Google Maps MCP server | MCP server (tool provider) | 0 | 547 | 7 | 547 | maps_distance_matrix (124) |
$3.89 | $1.30 |
| Puppeteer MCP server | MCP server (tool provider) | 0 | 538 | 7 | 538 | puppeteer_screenshot (142) |
$3.86 | $1.29 |
| Anthropic quickstarts agent | reference agent toolkit | 0 | 365 | 3 | 365 | file_write (152) |
$3.35 | $1.12 |
| Brave Search MCP server | MCP server (tool provider) | 0 | 317 | 2 | 317 | brave_web_search (161) |
$3.20 | $1.07 |
| Time MCP server | MCP server (tool provider) | 0 | 237 | 2 | 237 | convert_time (159) |
$2.96 | $0.99 |
| Fetch MCP server | MCP server (tool provider) | 0 | 236 | 1 | 236 | fetch (236) |
$2.96 | $0.99 |
| AWS KB Retrieval MCP server | MCP server (tool provider) | 0 | 101 | 1 | 101 | retrieve_from_aws_kb (101) |
$2.55 | $0.85 |
Scroll the table sideways on mobile to see every column.
Methodology
Every number reproducible.
-
Tokenizer: o200k_base (GPT-4o BPE) via
gpt-tokenizer. An ESTIMATE for Claude (Anthropic's differs a few %), applied identically to every agent so cross-agent comparisons are fair. Cross-validated against Pythontiktoken(o200k_base) — identical counts to the token. -
A "tool" = the object the model sees
(
{name, description, input_schema}) serialized to compact JSON, as the SDKs send thetoolsarray. -
Schema fidelity: Pydantic models re-run through
model_json_schema(); zod schemas throughzod-to-json-schema; literal-JSON schemas extracted verbatim. - Cost model: per-turn input = fixed overhead; per-turn output = a modeled 150-token reply. Sonnet $3/M in, $15/M out; Haiku $1/M in, $5/M out. Projected over 1,000 turns.
- Pinned sources: modelcontextprotocol/servers @ f5054df1fb86; servers-archived @ 9be4674d1ddf; anthropics/anthropic-quickstarts @ f37f1685e256; stackblitz/bolt.new @ eda10b121221.
Caveats
Stated honestly — the credibility lives here.
- Tokenizer is a Claude estimate (o200k_base is GPT's); absolute dollar figures carry a few-percent error bar. Applied uniformly, so relative comparisons and structural conclusions are robust.
- Provider-side tool-block scaffolding not modeled; real bills will be marginally higher.
- zod/Pydantic schemas are reconstructed via the same converters the servers use, not byte-captured; a few tokens may differ. Literal-JSON schemas are verbatim/exact.
- Schema-derived tools only — agents whose schemas are generated at runtime from code we couldn't reproduce exactly (e.g. Cline, gpt-researcher, OpenAI Swarm, Codex) were excluded rather than guessed. Cline in particular likely exceeds bolt.new but requires running its build to measure.
Cited repositories
Every agent measured, linked to the exact source.
- GitHub MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/github
- GitLab MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/gitlab
- Git MCP github.com/modelcontextprotocol/servers/tree/main/src/git
- Sequential Thinking MCP github.com/modelcontextprotocol/servers/tree/main/src/sequentialthinking
- Slack MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/slack
- Google Maps MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/google-maps
- Puppeteer MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/puppeteer
- Brave Search MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/brave-search
- AWS KB Retrieval MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/aws-kb-retrieval-server
- Fetch MCP github.com/modelcontextprotocol/servers/tree/main/src/fetch
- Time MCP github.com/modelcontextprotocol/servers/tree/main/src/time
- Anthropic quickstarts agent github.com/anthropics/anthropic-quickstarts/tree/main/agents
- bolt.new github.com/stackblitz/bolt.new
These are estimates. Token counts are computed with a GPT BPE tokenizer (o200k_base); Anthropic's tokenizer differs by a few percent, so absolute dollar figures carry a few-percent error bar. Applied uniformly across every agent, so the relative comparisons and structural conclusions are robust — but verify against your provider's real pricing and tokenizer before budgeting on any single figure.
Frequently asked: agent & MCP token costs
How many tokens does the GitHub MCP server use per turn?
The GitHub MCP server's 26 tool schemas total 3,546 tokens, re-sent on every turn — about $12.89 per 1,000 turns on Claude Sonnet ($4.30 on Haiku), billed before the model reads the user's question. It was the most expensive of the 13 agents measured.
What does the average agent/MCP tool cost in tokens?
Across 79 tools in 13 servers, the average tool is 123 tokens (median 103) — roughly 100–125 tokens of permanent, per-turn overhead for every tool you add. Ten tools ≈ a fixed ~1,000-token tax per turn before anything happens.
How large is the typical agent's hidden per-turn overhead?
The median agent re-sends 547 tokens of system prompt + tool schemas every turn (min 101, max 3,546) — about 9.6× a realistic 57-token user request, i.e. ~91% of the input before the user has spoken.
What is the most expensive single MCP tool?
sequentialthinking from the official MCP servers repo: 827 tokens in one
tool, almost all of it a ~565-token natural-language description. It is larger than the
entire toolset of 8 of the 12 multi-tool servers measured.
Does pretty-printing a tool's JSON schema increase token cost?
Yes — by about 20%. The identical Fetch MCP tool is 236 tokens compact vs 288
pretty-printed. Pydantic auto-adds a title to every field and
zod-to-json-schema appends $schema/additionalProperties —
invisible tokens nobody wrote, billed every turn.
How do I reduce my agent's tool-schema token cost?
Serialize schemas as compact JSON and drop auto-added fields, trim descriptions to what actually aids tool selection, load tools on demand (progressive disclosure), and route easy turns to a cheaper model like Claude Haiku. Measure first with the free Agent Token Profiler.
Fix it: slim your tool schemas
The cruft — $schema, additionalProperties, and Pydantic's auto-added
title — carries no tool-selection signal, so you can strip it with zero accuracy cost
and send compact JSON. Drop this in before you pass tools to the API:
Node / TypeScript
const strip = (o) => Array.isArray(o) ? o.map(strip)
: (o && typeof o === "object")
? Object.fromEntries(Object.entries(o)
.filter(([k]) => !["$schema", "additionalProperties", "title"].includes(k))
.map(([k, v]) => [k, strip(v)]))
: o;
// compact, cruft-free — send this to the model
const tools_slim = JSON.stringify(strip(tools));
Python
import json
def strip(o):
if isinstance(o, list): return [strip(x) for x in o]
if isinstance(o, dict): return {k: strip(v) for k, v in o.items()
if k not in ("$schema", "additionalProperties", "title")}
return o
tools_slim = json.dumps(strip(tools), separators=(",", ":")) # compact, cruft-free
One caveat: additionalProperties: false is occasionally intentional (strict
validation) — drop it from the strip list if you rely on it. Measure before/after with the
free profiler, which now flags this
cruft for you automatically.
Built by AgentLoop — a readable, MIT Claude-agent starter.
The full agent loop — streaming + tool use — in ~150 lines, MIT-licensed and free. The same method behind this study powers the free Agent Token Profiler.
AgentLoop Pro turns this measurement into a fix: token metering per run and per tool, plus model-routing that sends cheap turns to a cheaper model — and seven other production patterns (parallel tool calls, retries, persistent memory, human-in-the-loop approval), each in minimal, readable code. Pay what you want from $9 · commercial license · money-back guarantee.