The Hidden Token Tax: What Real AI Agents Actually Cost Per Turn

GitHub's MCP server re-sends 3,546 tokens of tool schemas on every single turn — that's $12.89 per 1,000 turns on Claude Sonnet, billed before the model reads a word of the user's question. We measured 13 real open-source AI agents (79 tools total): the median agent's invisible per-turn overhead is 9.6× larger than the user's actual request.

3,546
tokens GitHub's MCP server re-sends every turn
9.6×
median overhead vs. a real user request
35×
spread, leanest agent to most bloated
79
tools measured across 13 projects

The thesis

An AI agent re-sends its entire system prompt plus every tool/function schema on every single turn. That fixed payload is billed as input tokens on each request — invisibly — until the bill arrives. We measured it across 13 real open-source projects (79 tools total).

Top findings

Six things the data makes hard to argue with.

1 GitHub's MCP server carries 3,546 tokens of tool schemas on every turn — $12.89 per 1,000 conversation turns on Sonnet, paid before the model reads a word of the user's question.

Of that, $10.64 is pure schema input; the 150-token reply is the cheap part. 26 tools is all it takes to make the plumbing cost more than the work.

2 The median agent's invisible per-turn overhead is 547 tokens — against a realistic 57-token user request, that's 9.6× larger than the question itself, and 91% of the entire input.

Across all 13: min 101 / median 547 / max 3,546 tokens.

3 The fattest single tool is 827 tokens — bigger than the entire multi-tool schema set of 8 of the 12 tool-providers measured.

It's sequentialthinking from the official MCP server: a 565-token natural-language description wrapped around a tiny 9-field schema. A single verbose tool can out-weigh a whole toolbox.

4 A 35× spread separates the leanest agent from the most bloated.

GitHub's 3,546-token overhead is 35.1× the AWS-KB server's 101 — $12.89 vs $2.55 per 1k turns. Same pricing, same tokenizer. Bloat is a design choice, not a tax of nature.

5 Across all 79 tools, the average tool costs 123 tokens (median 103) — call it ~100–125 tokens of permanent, per-turn rent per tool you bolt on.

Ten tools ≈ a fixed ~1,000-token tax per turn before anything happens.

6 How you serialize the schema silently moves the bill ~20%.

The identical fetch tool is 236 tokens compact vs 288 pretty-printed — a 22% swing from whitespace alone. Pydantic auto-adds a title to every field; zod-to-json-schema appends $schema and additionalProperties — invisible tokens nobody wrote, billed every turn.

Measure your own agent

Every number here is an estimate from the same method as the free Agent Token Profiler — measure YOUR agent's per-turn cost in your browser (no signup, no key):

Open the free Agent Token Profiler →

The dataset

13 real open-source agents, 79 tools. Overhead / turn is the fixed payload re-sent on every request; $/1k turns projects that across 1,000 turns plus a modeled 150-token reply. See the per-server MCP token cost reference for each server's exact numbers.

Agent Type System tok Schema tok #Tools Overhead / turn Fattest single tool (tok) $/1k turns Sonnet $/1k turns Haiku
GitHub MCP server MCP server (tool provider) 0 3,546 26 3,546 create_pull_request_review (360) $12.89 $4.30
bolt.new prompt-embedded coding agent 3,118 0 0 (in-prose) 3,118 — (tools described in prompt) $11.60 $3.87
GitLab MCP server MCP server (tool provider) 0 1,194 9 1,194 push_files (175) $5.83 $1.94
Git MCP server MCP server (tool provider) 0 1,117 12 1,117 git_log (261) $5.60 $1.87
Sequential Thinking MCP server MCP server 0 827 1 827 sequentialthinking (827) $4.73 $1.58
Slack MCP server MCP server (tool provider) 0 679 8 679 slack_reply_to_thread (124) $4.29 $1.43
Google Maps MCP server MCP server (tool provider) 0 547 7 547 maps_distance_matrix (124) $3.89 $1.30
Puppeteer MCP server MCP server (tool provider) 0 538 7 538 puppeteer_screenshot (142) $3.86 $1.29
Anthropic quickstarts agent reference agent toolkit 0 365 3 365 file_write (152) $3.35 $1.12
Brave Search MCP server MCP server (tool provider) 0 317 2 317 brave_web_search (161) $3.20 $1.07
Time MCP server MCP server (tool provider) 0 237 2 237 convert_time (159) $2.96 $0.99
Fetch MCP server MCP server (tool provider) 0 236 1 236 fetch (236) $2.96 $0.99
AWS KB Retrieval MCP server MCP server (tool provider) 0 101 1 101 retrieve_from_aws_kb (101) $2.55 $0.85

Scroll the table sideways on mobile to see every column.

Two archetypes: MCP servers carry zero system prompt — they're host-agnostic tool providers, so their per-turn tax is pure schema (and it stacks: attach three servers, pay all three, every turn). bolt.new is the opposite — no function-calling API at all; its tools are described in prose inside a ~3,100-token system prompt. Both pay the same fundamental tax: a fixed payload re-sent every turn.

Methodology

Every number reproducible.

Caveats

Stated honestly — the credibility lives here.

Cited repositories

Every agent measured, linked to the exact source.

These are estimates. Token counts are computed with a GPT BPE tokenizer (o200k_base); Anthropic's tokenizer differs by a few percent, so absolute dollar figures carry a few-percent error bar. Applied uniformly across every agent, so the relative comparisons and structural conclusions are robust — but verify against your provider's real pricing and tokenizer before budgeting on any single figure.

Frequently asked: agent & MCP token costs

How many tokens does the GitHub MCP server use per turn?

The GitHub MCP server's 26 tool schemas total 3,546 tokens, re-sent on every turn — about $12.89 per 1,000 turns on Claude Sonnet ($4.30 on Haiku), billed before the model reads the user's question. It was the most expensive of the 13 agents measured.

What does the average agent/MCP tool cost in tokens?

Across 79 tools in 13 servers, the average tool is 123 tokens (median 103) — roughly 100–125 tokens of permanent, per-turn overhead for every tool you add. Ten tools ≈ a fixed ~1,000-token tax per turn before anything happens.

How large is the typical agent's hidden per-turn overhead?

The median agent re-sends 547 tokens of system prompt + tool schemas every turn (min 101, max 3,546) — about 9.6× a realistic 57-token user request, i.e. ~91% of the input before the user has spoken.

What is the most expensive single MCP tool?

sequentialthinking from the official MCP servers repo: 827 tokens in one tool, almost all of it a ~565-token natural-language description. It is larger than the entire toolset of 8 of the 12 multi-tool servers measured.

Does pretty-printing a tool's JSON schema increase token cost?

Yes — by about 20%. The identical Fetch MCP tool is 236 tokens compact vs 288 pretty-printed. Pydantic auto-adds a title to every field and zod-to-json-schema appends $schema/additionalProperties — invisible tokens nobody wrote, billed every turn.

How do I reduce my agent's tool-schema token cost?

Serialize schemas as compact JSON and drop auto-added fields, trim descriptions to what actually aids tool selection, load tools on demand (progressive disclosure), and route easy turns to a cheaper model like Claude Haiku. Measure first with the free Agent Token Profiler.

Fix it: slim your tool schemas

The cruft — $schema, additionalProperties, and Pydantic's auto-added title — carries no tool-selection signal, so you can strip it with zero accuracy cost and send compact JSON. Drop this in before you pass tools to the API:

Node / TypeScript

const strip = (o) => Array.isArray(o) ? o.map(strip)
  : (o && typeof o === "object")
    ? Object.fromEntries(Object.entries(o)
        .filter(([k]) => !["$schema", "additionalProperties", "title"].includes(k))
        .map(([k, v]) => [k, strip(v)]))
    : o;

// compact, cruft-free — send this to the model
const tools_slim = JSON.stringify(strip(tools));

Python

import json

def strip(o):
    if isinstance(o, list):  return [strip(x) for x in o]
    if isinstance(o, dict):  return {k: strip(v) for k, v in o.items()
                                     if k not in ("$schema", "additionalProperties", "title")}
    return o

tools_slim = json.dumps(strip(tools), separators=(",", ":"))  # compact, cruft-free

One caveat: additionalProperties: false is occasionally intentional (strict validation) — drop it from the strip list if you rely on it. Measure before/after with the free profiler, which now flags this cruft for you automatically.

Built by AgentLoop — a readable, MIT Claude-agent starter.

The full agent loop — streaming + tool use — in ~150 lines, MIT-licensed and free. The same method behind this study powers the free Agent Token Profiler.

AgentLoop Pro turns this measurement into a fix: token metering per run and per tool, plus model-routing that sends cheap turns to a cheaper model — and seven other production patterns (parallel tool calls, retries, persistent memory, human-in-the-loop approval), each in minimal, readable code. Pay what you want from $9 · commercial license · money-back guarantee.