When engineers first integrate tool use into an AI agent, they naturally focus on the outputs — what the model does with tool results, how it reasons, how it calls the right function. What gets ignored is what happens before any tool is called: the entire tool schema is injected into the model's context on every single request. This is a silent, often massive token expense that most teams never measure until their costs spiral out of control.
This topic breaks down exactly how tool definitions consume tokens, shows you how to measure the overhead, and gives you concrete strategies to reduce it — without sacrificing capability.
How the Model Receives Tool Definitions
Every major AI provider — OpenAI, Anthropic, Google Gemini — requires tool definitions to be sent as structured JSON in the API request. The model cannot "remember" tools from a previous call. There is no persistent tool registry. Every request is stateless, and every request that involves tools must include the full schema for every tool the model might use.
In the OpenAI function calling format, this looks like:
{
"model": "gpt-4o",
"messages": [...],
"tools": [
{
"type": "function",
"function": {
"name": "search_codebase",
"description": "Search the codebase for files or code patterns matching a query. Returns file paths, line numbers, and matching content excerpts.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query or code pattern to look for"
},
"file_pattern": {
"type": "string",
"description": "Optional glob pattern to restrict the search, e.g. '**/*.ts'"
},
"max_results": {
"type": "integer",
"description": "Maximum number of results to return. Defaults to 20."
}
},
"required": ["query"]
}
}
}
]
}
The Anthropic Claude API uses a slightly different structure but the principle is identical:
{
"model": "claude-opus-4-5",
"tools": [
{
"name": "search_codebase",
"description": "Search the codebase for files or code patterns matching a query.",
"input_schema": {
"type": "object",
"properties": {
"query": { "type": "string", "description": "Search query" },
"file_pattern": { "type": "string", "description": "Glob pattern" },
"max_results": { "type": "integer", "description": "Max results, default 20" }
},
"required": ["query"]
}
}
],
"messages": [...]
}
Both representations consume tokens. The JSON keys, the string values in descriptions, the nesting — all of it is tokenized and counted against your context window and your per-token billing.
Tip: Use your provider's tokenizer to measure your tool schema overhead before building anything at scale. For OpenAI, the tiktoken library lets you encode JSON as a string and count tokens directly. For Anthropic, use anthropic.count_tokens() with the full request payload. A tool with a verbose description and many parameters can easily consume 150–300 tokens — multiply that by 20 tools and you're looking at 3,000–6,000 tokens per request before a single word of user input is processed.
Measuring Tool Schema Token Costs
Before you can optimize, you must measure. Most developers underestimate tool schema costs because they don't see them in the obvious places — the tool schemas are not part of the messages array in your application code, so they're invisible in your chat history logs.
Here is a Python script that measures tool schema token cost precisely for OpenAI:
import tiktoken
import json
def measure_tool_tokens(tools: list[dict]) -> int:
"""Measure how many tokens your tool definitions consume."""
enc = tiktoken.encoding_for_model("gpt-4o")
# Serialize the entire tools array as it will appear in the request
tools_json = json.dumps(tools)
# OpenAI adds overhead for the tools format itself
base_overhead = 3 # tokens for the tools array wrapper
per_tool_overhead = 3 # tokens per tool for the function wrapper
token_count = len(enc.encode(tools_json))
total = token_count + base_overhead + (per_tool_overhead * len(tools))
return total
tools = [
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read the contents of a file from the filesystem.",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Absolute file path"}
},
"required": ["path"]
}
}
},
{
"type": "function",
"function": {
"name": "write_file",
"description": "Write or overwrite a file on the filesystem with provided content.",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Absolute file path"},
"content": {"type": "string", "description": "File content to write"}
},
"required": ["path", "content"]
}
}
}
]
cost = measure_tool_tokens(tools)
print(f"Tool definitions consume approximately {cost} tokens per request")
For the Anthropic API, you can use the token counting endpoint directly:
import anthropic
client = anthropic.Anthropic()
response = client.messages.count_tokens(
model="claude-opus-4-5",
tools=[
{
"name": "read_file",
"description": "Read the contents of a file from the filesystem.",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Absolute file path"}
},
"required": ["path"]
}
}
],
messages=[{"role": "user", "content": "Hello"}]
)
print(f"Input tokens (with tools): {response.input_tokens}")
When you run this analysis on a real agent with 15–30 tools, the numbers are often shocking. Teams commonly discover that tool schemas account for 20–40% of their total input token budget.
Tip: Build a monitoring step into your agent's initialization that logs tool token costs at startup. Any tool that costs more than 200 tokens to describe is a candidate for optimization. Track this metric over time — as tools are added to the codebase, schema costs tend to grow uncontrolled.
The Anatomy of a Bloated Tool Schema
Not all token waste in tool schemas is obvious. Let's dissect the most common sources of token bloat:
1. Verbose descriptions with redundant phrasing
// Bloated — 47 tokens
"description": "This function allows you to search through the codebase to find relevant files and code snippets that match a given search query string. It will return a list of matching results."
// Lean — 18 tokens
"description": "Search codebase for files and code snippets matching a query. Returns matching results."
2. Overly detailed parameter descriptions
// Bloated — adds ~30 tokens of description per parameter
"max_results": {
"type": "integer",
"description": "The maximum number of search results to return in the response. If not specified, the default value of 20 will be used. Must be a positive integer between 1 and 100."
}
// Lean — adds ~8 tokens
"max_results": {
"type": "integer",
"description": "Max results (default 20, max 100)"
}
3. Unnecessary enum values listed in descriptions
// Bloated — listing enum semantics in both enum and description
"status": {
"type": "string",
"enum": ["open", "closed", "pending", "resolved", "escalated"],
"description": "The ticket status. Use 'open' for new tickets, 'closed' for resolved tickets, 'pending' for tickets awaiting response, 'resolved' for completed tickets, and 'escalated' for high-priority tickets."
}
// Lean — the enum values are self-explanatory
"status": {
"type": "string",
"enum": ["open", "closed", "pending", "resolved", "escalated"],
"description": "Ticket status filter"
}
4. Deeply nested parameter schemas
Deeply nested objects in JSON schema — objects within objects within objects — generate large schema definitions. When possible, flatten your parameter structure. Instead of accepting a nested configuration object, accept individual flat parameters.
5. additionalProperties and complex validation keywords
JSON Schema validation keywords like minimum, maximum, pattern, minLength, maxLength, and additionalProperties: false all consume tokens. Most of these validations should happen in your tool implementation, not in the schema definition passed to the model.
Tip: Apply the "newspaper headline" principle to tool descriptions: say the most important thing in the fewest words. The model does not need a tutorial in the description — it needs a precise, unambiguous label. If the model is calling the wrong tool, the fix is rarely to add more words to the description; it is usually to improve tool naming or add a distinguishing phrase.
Token Cost Across Different Provider Formats
The raw JSON is only part of the story. Different providers process tool definitions differently, and the actual token cost depends on how the provider formats the schema before tokenization.
OpenAI / GPT-4o series: Tools are passed as a tools array. OpenAI's tokenizer counts the entire JSON structure. The system prompt is separate, so tool schemas add to but do not replace system prompt tokens.
Anthropic Claude: Tools are passed as a tools array. Anthropic explicitly states that tool definitions are converted to a format that resembles an XML-like block appended to the system prompt internally. This means tool schemas consume tokens from your effective system prompt budget.
Google Gemini: Tools are passed as FunctionDeclaration objects in a tools parameter. The token accounting is similar — definitions are tokenized and counted.
LangChain agents: LangChain's older ReAct-style agents injected tool descriptions as plain text into the system prompt. This is even more expensive than structured JSON schemas for complex tools, because plain text representations tend to be wordier. Newer LangChain versions use native function calling where available.
MCP (Model Context Protocol) servers: MCP tool definitions are pulled from the server at runtime and then passed to the model. The MCP server returns a tools/list response with full JSON schemas. Every MCP tool you expose goes into the model's context window. Teams running 3–5 MCP servers with 10–20 tools each are routinely injecting 5,000–15,000 tokens of tool schema overhead per request.
Here is a realistic MCP server tool list response showing the scope of the problem:
{
"tools": [
{ "name": "read_file", "description": "...", "inputSchema": {...} },
{ "name": "write_file", "description": "...", "inputSchema": {...} },
{ "name": "list_directory", "description": "...", "inputSchema": {...} },
{ "name": "search_files", "description": "...", "inputSchema": {...} },
{ "name": "execute_command", "description": "...", "inputSchema": {...} },
{ "name": "get_file_info", "description": "...", "inputSchema": {...} }
]
}
Each of those entries, when fully expanded with its schema, can cost 80–200 tokens. Six tools from a single filesystem MCP server: 480–1,200 tokens, before you add the database server, the git server, the browser server, and the API server.
Tip: When evaluating MCP servers for production use, always request the full tools/list response from the server and measure its token cost before integrating it. This is a standard part of the MCP server evaluation checklist. A server that exposes 30 granular tools may be less token-efficient than one that exposes 8 well-designed composite tools.
Real-World Impact: Token Cost vs. Context Window Budget
To make the stakes concrete, consider a real scenario. You are building a coding assistant agent with the following tool set:
| Tool | Est. Token Cost |
|---|---|
| read_file | 85 tokens |
| write_file | 90 tokens |
| list_directory | 75 tokens |
| search_codebase | 120 tokens |
| run_tests | 95 tokens |
| get_git_diff | 80 tokens |
| create_branch | 70 tokens |
| search_documentation | 110 tokens |
| query_database | 130 tokens |
| call_api | 140 tokens |
| send_notification | 95 tokens |
| update_ticket | 115 tokens |
| Total | ~1,205 tokens |
That is 1,205 tokens consumed before the user's first message. On a model with a 200,000-token context window, this seems insignificant. But in practice:
- You have a system prompt of 2,000 tokens
- The conversation history grows with each turn
- Tool call results add hundreds to thousands of tokens
- Code files loaded as context add thousands more
The tool schema overhead compounds at every request. In a long agentic session with 20 turns, you pay 1,205 × 20 = 24,100 tokens just for tool definitions. At GPT-4o pricing, this is a real cost. At Claude Opus pricing, it is a significant one.
For QA engineers running automated test generation agents overnight, or product managers using agents to generate PRDs across entire product backlogs, the cumulative cost is not trivial. This is why tool schema optimization is a core engineering discipline, not an afterthought.
Tip: Establish a "tool budget" for your agent — a maximum number of tokens you will allow tool schemas to consume. Enforce this in code as a startup assertion. When a developer adds a new tool, the test suite should verify that the total tool schema token count stays within budget. This creates a forcing function for keeping schemas lean as the codebase evolves.
Summary
Tool definitions are not free. They are injected into every request, tokenized in full, and charged at the same rate as any other input token. The typical mid-complexity agent with 10–15 tools spends 800–2,000 tokens on tool schemas alone, per request. With verbose descriptions, nested schemas, and multiple MCP servers, this easily reaches 5,000–10,000 tokens.
The foundation of tool use token optimization is measurement: know exactly how many tokens your tool schemas cost before you start optimizing. Then apply the principles covered here — lean descriptions, flat parameters, no redundant validation keywords — to reduce that cost. The following topics in this module show you how to go further with dynamic tool loading, result trimming, and efficient tool design.