The Hidden Token Tax: What Real AI Agents Actually Cost Per Turn

Q: How many tokens does the GitHub MCP server use per turn?

The GitHub MCP server's 26 tool schemas total 3,546 tokens, re-sent on every turn — about $12.89 per 1,000 turns on Claude Sonnet ($4.30 on Haiku), billed before the model reads the user's question. It was the most expensive of the 13 agents measured.

Q: What does the average agent/MCP tool cost in tokens?

Across 79 tools in 13 servers, the average tool is 123 tokens (median 103) — roughly 100–125 tokens of permanent, per-turn overhead for every tool you add. Ten tools is a fixed ~1,000-token tax per turn before anything happens.

Q: How large is the typical agent's hidden per-turn overhead?

The median agent re-sends 547 tokens of system prompt plus tool schemas every turn (min 101, max 3,546) — about 9.6x a realistic 57-token user request, roughly 91% of the input before the user has spoken.

Q: What is the most expensive single MCP tool?

sequentialthinking from the official MCP servers repo: 827 tokens in one tool, almost all of it a ~565-token natural-language description. It is larger than the entire toolset of 8 of the 12 multi-tool servers measured.

Q: Does pretty-printing a tool's JSON schema increase token cost?

Yes, by about 20%. The identical Fetch MCP tool is 236 tokens compact vs 288 pretty-printed. Pydantic auto-adds a title to every field and zod-to-json-schema appends $schema/additionalProperties — invisible tokens nobody wrote, billed every turn.

GitHub's MCP server re-sends 3,546 tokens of tool schemas on every single turn — that's $12.89 per 1,000 turns on Claude Sonnet, billed before the model reads a word of the user's question. We measured 13 real open-source AI agents (79 tools total): the median agent's invisible per-turn overhead is 9.6× larger than the user's actual request.

3,546

tokens GitHub's MCP server re-sends every turn

9.6×

median overhead vs. a real user request

35×

spread, leanest agent to most bloated

tools measured across 13 projects

The thesis

An AI agent re-sends its entire system prompt plus every tool/function schema on every single turn. That fixed payload is billed as input tokens on each request — invisibly — until the bill arrives. We measured it across 13 real open-source projects (79 tools total).

Top findings

Six things the data makes hard to argue with.

1 GitHub's MCP server carries 3,546 tokens of tool schemas on every turn — $12.89 per 1,000 conversation turns on Sonnet, paid before the model reads a word of the user's question.

Of that, $10.64 is pure schema input; the 150-token reply is the cheap part. 26 tools is all it takes to make the plumbing cost more than the work.

2 The median agent's invisible per-turn overhead is 547 tokens — against a realistic 57-token user request, that's 9.6× larger than the question itself, and 91% of the entire input.

Across all 13: min 101 / median 547 / max 3,546 tokens.

3 The fattest single tool is 827 tokens — bigger than the entire multi-tool schema set of 8 of the 12 tool-providers measured.

It's sequentialthinking from the official MCP server: a 565-token natural-language description wrapped around a tiny 9-field schema. A single verbose tool can out-weigh a whole toolbox.

4 A 35× spread separates the leanest agent from the most bloated.

GitHub's 3,546-token overhead is 35.1× the AWS-KB server's 101 — $12.89 vs $2.55 per 1k turns. Same pricing, same tokenizer. Bloat is a design choice, not a tax of nature.

5 Across all 79 tools, the average tool costs 123 tokens (median 103) — call it ~100–125 tokens of permanent, per-turn rent per tool you bolt on.

Ten tools ≈ a fixed ~1,000-token tax per turn before anything happens.

6 How you serialize the schema silently moves the bill ~20%.

The identical fetch tool is 236 tokens compact vs 288 pretty-printed — a 22% swing from whitespace alone. Pydantic auto-adds a title to every field; zod-to-json-schema appends $schema and additionalProperties — invisible tokens nobody wrote, billed every turn.

Measure your own agent

Every number here is an estimate from the same method as the free Agent Token Profiler — measure YOUR agent's per-turn cost in your browser (no signup, no key):

Open the free Agent Token Profiler →

The dataset

13 real open-source agents, 79 tools. Overhead / turn is the fixed payload re-sent on every request; $/1k turns projects that across 1,000 turns plus a modeled 150-token reply. See the per-server MCP token cost reference for each server's exact numbers.

Agent	Type	System tok	Schema tok	#Tools	Overhead / turn	Fattest single tool (tok)	$/1k turns Sonnet	$/1k turns Haiku
GitHub MCP server	MCP server (tool provider)	0	3,546	26	3,546	`create_pull_request_review` (360)	$12.89	$4.30
bolt.new	prompt-embedded coding agent	3,118	0	0 (in-prose)	3,118	— (tools described in prompt)	$11.60	$3.87
GitLab MCP server	MCP server (tool provider)	0	1,194	9	1,194	`push_files` (175)	$5.83	$1.94
Git MCP server	MCP server (tool provider)	0	1,117	12	1,117	`git_log` (261)	$5.60	$1.87
Sequential Thinking MCP server	MCP server	0	827	1	827	`sequentialthinking` (827)	$4.73	$1.58
Slack MCP server	MCP server (tool provider)	0	679	8	679	`slack_reply_to_thread` (124)	$4.29	$1.43
Google Maps MCP server	MCP server (tool provider)	0	547	7	547	`maps_distance_matrix` (124)	$3.89	$1.30
Puppeteer MCP server	MCP server (tool provider)	0	538	7	538	`puppeteer_screenshot` (142)	$3.86	$1.29
Anthropic quickstarts agent	reference agent toolkit	0	365	3	365	`file_write` (152)	$3.35	$1.12
Brave Search MCP server	MCP server (tool provider)	0	317	2	317	`brave_web_search` (161)	$3.20	$1.07
Time MCP server	MCP server (tool provider)	0	237	2	237	`convert_time` (159)	$2.96	$0.99
Fetch MCP server	MCP server (tool provider)	0	236	1	236	`fetch` (236)	$2.96	$0.99
AWS KB Retrieval MCP server	MCP server (tool provider)	0	101	1	101	`retrieve_from_aws_kb` (101)	$2.55	$0.85

Scroll the table sideways on mobile to see every column.

Two archetypes: MCP servers carry zero system prompt — they're host-agnostic tool providers, so their per-turn tax is pure schema (and it stacks: attach three servers, pay all three, every turn). bolt.new is the opposite — no function-calling API at all; its tools are described in prose inside a ~3,100-token system prompt. Both pay the same fundamental tax: a fixed payload re-sent every turn.

Methodology

Every number reproducible.

Tokenizer: o200k_base (GPT-4o BPE) via gpt-tokenizer. An ESTIMATE for Claude (Anthropic's differs a few %), applied identically to every agent so cross-agent comparisons are fair. Cross-validated against Python tiktoken (o200k_base) — identical counts to the token.
A "tool" = the object the model sees ({name, description, input_schema}) serialized to compact JSON, as the SDKs send the tools array.
Schema fidelity: Pydantic models re-run through model_json_schema(); zod schemas through zod-to-json-schema; literal-JSON schemas extracted verbatim.
Cost model: per-turn input = fixed overhead; per-turn output = a modeled 150-token reply. Sonnet $3/M in, $15/M out; Haiku $1/M in, $5/M out. Projected over 1,000 turns.
Pinned sources: modelcontextprotocol/servers @ f5054df1fb86; servers-archived @ 9be4674d1ddf; anthropics/anthropic-quickstarts @ f37f1685e256; stackblitz/bolt.new @ eda10b121221.

Caveats

Stated honestly — the credibility lives here.

Tokenizer is a Claude estimate (o200k_base is GPT's); absolute dollar figures carry a few-percent error bar. Applied uniformly, so relative comparisons and structural conclusions are robust.
Provider-side tool-block scaffolding not modeled; real bills will be marginally higher.
zod/Pydantic schemas are reconstructed via the same converters the servers use, not byte-captured; a few tokens may differ. Literal-JSON schemas are verbatim/exact.
Schema-derived tools only — agents whose schemas are generated at runtime from code we couldn't reproduce exactly (e.g. Cline, gpt-researcher, OpenAI Swarm, Codex) were excluded rather than guessed. Cline in particular likely exceeds bolt.new but requires running its build to measure.

Cited repositories

Every agent measured, linked to the exact source.

GitHub MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/github
GitLab MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/gitlab
Git MCP github.com/modelcontextprotocol/servers/tree/main/src/git
Sequential Thinking MCP github.com/modelcontextprotocol/servers/tree/main/src/sequentialthinking
Slack MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/slack
Google Maps MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/google-maps
Puppeteer MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/puppeteer
Brave Search MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/brave-search
AWS KB Retrieval MCP github.com/modelcontextprotocol/servers-archived/tree/main/src/aws-kb-retrieval-server
Fetch MCP github.com/modelcontextprotocol/servers/tree/main/src/fetch
Time MCP github.com/modelcontextprotocol/servers/tree/main/src/time
Anthropic quickstarts agent github.com/anthropics/anthropic-quickstarts/tree/main/agents
bolt.new github.com/stackblitz/bolt.new

These are estimates. Token counts are computed with a GPT BPE tokenizer (o200k_base); Anthropic's tokenizer differs by a few percent, so absolute dollar figures carry a few-percent error bar. Applied uniformly across every agent, so the relative comparisons and structural conclusions are robust — but verify against your provider's real pricing and tokenizer before budgeting on any single figure.

Frequently asked: agent & MCP token costs

How many tokens does the GitHub MCP server use per turn?

The GitHub MCP server's 26 tool schemas total 3,546 tokens, re-sent on every turn — about $12.89 per 1,000 turns on Claude Sonnet ($4.30 on Haiku), billed before the model reads the user's question. It was the most expensive of the 13 agents measured.

What does the average agent/MCP tool cost in tokens?

Across 79 tools in 13 servers, the average tool is 123 tokens (median 103) — roughly 100–125 tokens of permanent, per-turn overhead for every tool you add. Ten tools ≈ a fixed ~1,000-token tax per turn before anything happens.

How large is the typical agent's hidden per-turn overhead?

The median agent re-sends 547 tokens of system prompt + tool schemas every turn (min 101, max 3,546) — about 9.6× a realistic 57-token user request, i.e. ~91% of the input before the user has spoken.

What is the most expensive single MCP tool?

sequentialthinking from the official MCP servers repo: 827 tokens in one tool, almost all of it a ~565-token natural-language description. It is larger than the entire toolset of 8 of the 12 multi-tool servers measured.

Does pretty-printing a tool's JSON schema increase token cost?

Yes — by about 20%. The identical Fetch MCP tool is 236 tokens compact vs 288 pretty-printed. Pydantic auto-adds a title to every field and zod-to-json-schema appends $schema/additionalProperties — invisible tokens nobody wrote, billed every turn.

How do I reduce my agent's tool-schema token cost?

Serialize schemas as compact JSON and drop auto-added fields, trim descriptions to what actually aids tool selection, load tools on demand (progressive disclosure), and route easy turns to a cheaper model like Claude Haiku. Measure first with the free Agent Token Profiler.

Fix it: slim your tool schemas

The cruft — $schema, additionalProperties, and Pydantic's auto-added title — carries no tool-selection signal, so you can strip it with zero accuracy cost and send compact JSON. Drop this in before you pass tools to the API:

Node / TypeScript

const strip = (o) => Array.isArray(o) ? o.map(strip)
  : (o && typeof o === "object")
    ? Object.fromEntries(Object.entries(o)
        .filter(([k]) => !["$schema", "additionalProperties", "title"].includes(k))
        .map(([k, v]) => [k, strip(v)]))
    : o;

// compact, cruft-free — send this to the model
const tools_slim = JSON.stringify(strip(tools));

Python

import json

def strip(o):
    if isinstance(o, list):  return [strip(x) for x in o]
    if isinstance(o, dict):  return {k: strip(v) for k, v in o.items()
                                     if k not in ("$schema", "additionalProperties", "title")}
    return o

tools_slim = json.dumps(strip(tools), separators=(",", ":"))  # compact, cruft-free

One caveat: additionalProperties: false is occasionally intentional (strict validation) — drop it from the strip list if you rely on it. Measure before/after with the free profiler, which now flags this cruft for you automatically.

Built by AgentLoop — a readable, MIT Claude-agent starter.

The full agent loop — streaming + tool use — in ~150 lines, MIT-licensed and free. The same method behind this study powers the free Agent Token Profiler.

AgentLoop Pro turns this measurement into a fix: token metering per run and per tool, plus model-routing that sends cheap turns to a cheaper model — and seven other production patterns (parallel tool calls, retries, persistent memory, human-in-the-loop approval), each in minimal, readable code. Pay what you want from $9 · commercial license · money-back guarantee.

★ Free MIT core on GitHub AgentLoop Pro — pay what you want, from $9