The single most effective strategy for reducing tool-related token overhead is also the most conceptually simple: do not send tools the model does not need. Instead of loading every tool your agent is capable of using into every request, load only the tools relevant to the current task or the current step in a workflow.
This is called dynamic tool loading, and it is a fundamental pattern for building token-efficient, production-grade AI agents. For software engineers, it mirrors the principle of dependency injection. For QA engineers building automated test agents, it is the difference between running an agent that costs $0.08 per test cycle versus $0.02. For product managers overseeing AI infrastructure, it is a lever that can cut agent operating costs by 30–60% without changing the agent's behavior.
The Problem with Static Tool Loading
Most agent frameworks, tutorials, and quickstart examples default to static tool loading: you define a list of tools once at startup, and that list is included in every single request for the lifetime of the agent.
This is fine when you have 3–5 tools. It becomes a serious problem at scale.
Consider a multi-purpose development assistant agent. Over time, the team adds tools for:
- File system operations (5 tools)
- Git operations (6 tools)
- Code search and analysis (4 tools)
- Test execution (3 tools)
- Database querying (4 tools)
- API calls (5 tools)
- Documentation retrieval (3 tools)
- Ticket management (4 tools)
- Deployment operations (4 tools)
- Monitoring and alerting (3 tools)
That is 41 tools. If each tool schema averages 100 tokens, every request carries 4,100 tokens of tool schema overhead — regardless of whether the user asked "what files are in the src/ directory?" (needs 1 tool) or "run the full test suite and file a ticket for any failures" (needs 4–5 tools).
With dynamic tool loading, the first request loads 1 tool. The second loads 4–5. The savings are immediate and scale with request volume.
Tip: Audit your agent's request logs over a 1–2 week period and record which tools are actually called in each session. You will almost certainly find that 20–30% of your defined tools are never called in typical usage, and that most requests use only 3–7 tools from a set of 20+. This audit gives you the data to make the case for dynamic tool loading and defines your tool grouping strategy.
Strategy 1: Intent-Based Tool Selection
The most powerful approach to dynamic tool loading uses a lightweight intent classification step to determine which tools to load before sending the main agent request.
The classifier can be a small, cheap model call — or even a deterministic regex/keyword match for well-defined use cases. The classifier's job is to map the user's request to a tool group.
Here is a complete implementation pattern in Python:
from anthropic import Anthropic
from typing import Callable
client = Anthropic()
TOOL_GROUPS = {
"filesystem": [
{
"name": "read_file",
"description": "Read file contents",
"input_schema": {
"type": "object",
"properties": {"path": {"type": "string", "description": "File path"}},
"required": ["path"]
}
},
{
"name": "write_file",
"description": "Write content to file",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path"},
"content": {"type": "string", "description": "Content to write"}
},
"required": ["path", "content"]
}
},
{
"name": "list_directory",
"description": "List directory contents",
"input_schema": {
"type": "object",
"properties": {"path": {"type": "string", "description": "Directory path"}},
"required": ["path"]
}
}
],
"git": [
{
"name": "git_status",
"description": "Get git repository status",
"input_schema": {"type": "object", "properties": {}, "required": []}
},
{
"name": "git_diff",
"description": "Get diff for changed files",
"input_schema": {
"type": "object",
"properties": {
"staged": {"type": "boolean", "description": "Show staged changes only"}
},
"required": []
}
},
{
"name": "git_commit",
"description": "Create a git commit",
"input_schema": {
"type": "object",
"properties": {"message": {"type": "string", "description": "Commit message"}},
"required": ["message"]
}
}
],
"testing": [
{
"name": "run_tests",
"description": "Run the test suite",
"input_schema": {
"type": "object",
"properties": {
"pattern": {"type": "string", "description": "Test file pattern"},
"verbose": {"type": "boolean", "description": "Verbose output"}
},
"required": []
}
},
{
"name": "get_coverage",
"description": "Get code coverage report",
"input_schema": {"type": "object", "properties": {}, "required": []}
}
]
}
def classify_intent(user_message: str) -> list[str]:
"""Use a cheap model call to classify which tool groups are needed."""
response = client.messages.create(
model="claude-haiku-4-5", # Use cheapest model for classification
max_tokens=100,
system="""Classify which tool groups are needed for this request.
Available groups: filesystem, git, testing
Return only a comma-separated list of needed groups.
Example output: filesystem,git""",
messages=[{"role": "user", "content": user_message}]
)
groups_text = response.content[0].text.strip().lower()
return [g.strip() for g in groups_text.split(",") if g.strip() in TOOL_GROUPS]
def get_tools_for_request(user_message: str) -> list[dict]:
"""Dynamically select tools based on user intent."""
needed_groups = classify_intent(user_message)
tools = []
for group in needed_groups:
tools.extend(TOOL_GROUPS.get(group, []))
return tools
def run_agent(user_message: str) -> str:
"""Run agent with dynamically loaded tools."""
tools = get_tools_for_request(user_message)
print(f"Loaded {len(tools)} tools for this request")
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4096,
tools=tools if tools else [],
messages=[{"role": "user", "content": user_message}]
)
return response
result = run_agent("Can you show me what files changed in git and run the tests?")
The key insight here is the two-tier approach: a cheap, fast model call (Claude Haiku, GPT-4o-mini, or a rule-based classifier) decides the tool set, and the expensive model call does the actual work with a minimal, focused tool set.
Tip: The classifier does not need to be perfect. If it occasionally loads an extra tool group, that is acceptable — it is still better than loading all tools every time. Design the classifier to err on the side of inclusion rather than exclusion. A missed tool forces a costly retry or a failed task; an extra tool group costs a modest number of tokens.
Strategy 2: Workflow-Stage Tool Loading
For agents that execute multi-step workflows, the required tools are often deterministic based on the current workflow stage. This allows you to predefine which tools are available at each stage — no classification call needed.
This is particularly valuable for:
- QA automation agents: Stage 1 tools = code reading, Stage 2 tools = test generation, Stage 3 tools = test execution, Stage 4 tools = result reporting
- PR review agents: Stage 1 = file reading, Stage 2 = code search, Stage 3 = comment posting
- Feature planning agents: Stage 1 = documentation reading, Stage 2 = ticket searching, Stage 3 = ticket creation
from enum import Enum
from dataclasses import dataclass
class WorkflowStage(Enum):
ANALYZE = "analyze"
PLAN = "plan"
EXECUTE = "execute"
VERIFY = "verify"
REPORT = "report"
STAGE_TOOLS = {
WorkflowStage.ANALYZE: ["read_file", "list_directory", "search_codebase"],
WorkflowStage.PLAN: ["search_codebase", "read_file", "get_ticket_details"],
WorkflowStage.EXECUTE: ["write_file", "run_command", "create_branch"],
WorkflowStage.VERIFY: ["run_tests", "get_coverage", "lint_code"],
WorkflowStage.REPORT: ["create_ticket", "post_comment", "send_notification"]
}
@dataclass
class AgentState:
stage: WorkflowStage
context: dict
class WorkflowAgent:
def __init__(self, all_tools: dict):
self.all_tools = all_tools # name -> tool definition
def get_stage_tools(self, stage: WorkflowStage) -> list[dict]:
"""Return only the tools needed for this workflow stage."""
tool_names = STAGE_TOOLS[stage]
return [self.all_tools[name] for name in tool_names if name in self.all_tools]
def step(self, state: AgentState, message: str) -> tuple[str, AgentState]:
tools = self.get_stage_tools(state.stage)
# Make the agent call with minimal, stage-appropriate tools
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4096,
tools=tools,
messages=[{"role": "user", "content": message}]
)
# Determine next stage based on response
next_stage = self._advance_stage(state.stage, response)
return response, AgentState(stage=next_stage, context=state.context)
def _advance_stage(self, current: WorkflowStage, response) -> WorkflowStage:
stage_order = list(WorkflowStage)
current_idx = stage_order.index(current)
if current_idx < len(stage_order) - 1:
return stage_order[current_idx + 1]
return current
Tip: Document your tool-to-stage mappings in a configuration file, not hardcoded in your application logic. This makes it easy for team members to adjust the mapping as the workflow evolves, and it allows you to test different tool grouping strategies by swapping configuration without changing code.
Strategy 3: Tool Discovery via MCP Dynamic Filtering
When using MCP (Model Context Protocol) servers, the server returns a full list of tools at initialization time. Most MCP client implementations load all discovered tools into every request. You can intercept this and filter the tool list before each request.
Here is how to implement a filtering layer for MCP tool use:
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import Anthropic from "@anthropic-ai/sdk";
interface ToolFilter {
includeGroups?: string[];
excludeTools?: string[];
maxTools?: number;
}
class FilteredMCPClient {
private mcpClient: Client;
private allTools: Anthropic.Tool[] = [];
async initialize() {
// Fetch all tools from MCP server once at startup
const toolsResult = await this.mcpClient.request(
{ method: "tools/list" },
{}
);
this.allTools = toolsResult.tools.map(tool => ({
name: tool.name,
description: tool.description || "",
input_schema: tool.inputSchema as Anthropic.Tool["input_schema"]
}));
console.log(`Loaded ${this.allTools.length} tools from MCP server`);
}
getFilteredTools(filter: ToolFilter): Anthropic.Tool[] {
let tools = [...this.allTools];
// Filter by group prefix naming convention
if (filter.includeGroups && filter.includeGroups.length > 0) {
tools = tools.filter(tool =>
filter.includeGroups!.some(group => tool.name.startsWith(group + "_"))
);
}
// Exclude specific tools
if (filter.excludeTools && filter.excludeTools.length > 0) {
tools = tools.filter(tool => !filter.excludeTools!.includes(tool.name));
}
// Enforce max tools limit
if (filter.maxTools) {
tools = tools.slice(0, filter.maxTools);
}
return tools;
}
async runRequest(
messages: Anthropic.MessageParam[],
filter: ToolFilter
): Promise<Anthropic.Message> {
const tools = this.getFilteredTools(filter);
const anthropic = new Anthropic();
return anthropic.messages.create({
model: "claude-opus-4-5",
max_tokens: 4096,
tools,
messages
});
}
}
// Usage: only load filesystem tools for a file reading task
const response = await filteredClient.runRequest(
[{ role: "user", content: "List all TypeScript files in the src directory" }],
{ includeGroups: ["fs", "file"], maxTools: 5 }
);
Tip: Use a consistent naming convention for your MCP tools that encodes the tool group in the name prefix (e.g., fs_read, fs_write, git_status, git_commit). This makes programmatic filtering straightforward and avoids the need to maintain a separate group-membership registry.
Strategy 4: Capability Negotiation at Agent Startup
For long-running agents or chat-based interfaces where you cannot predict every task ahead of time, implement capability negotiation: the agent starts with a minimal set of core tools, and can request additional tool groups at runtime when it determines they are needed.
This requires a special "load_tools" meta-tool:
META_TOOL = {
"name": "request_capability",
"description": "Request access to additional tools for a specific capability area",
"input_schema": {
"type": "object",
"properties": {
"capability": {
"type": "string",
"enum": ["filesystem", "git", "testing", "database", "deployment"],
"description": "The capability area to enable"
},
"reason": {
"type": "string",
"description": "Brief reason why this capability is needed"
}
},
"required": ["capability"]
}
}
class AdaptiveToolAgent:
def __init__(self):
self.active_tool_groups = set(["core"]) # Start with core tools only
self.conversation_history = []
def get_current_tools(self) -> list[dict]:
tools = [META_TOOL] # Always include the meta-tool
for group in self.active_tool_groups:
tools.extend(TOOL_GROUPS.get(group, []))
return tools
def handle_tool_call(self, tool_name: str, tool_input: dict) -> str:
if tool_name == "request_capability":
capability = tool_input["capability"]
self.active_tool_groups.add(capability)
return f"Capability '{capability}' enabled. You now have access to {len(TOOL_GROUPS.get(capability, []))} additional tools."
# Handle other tool calls...
return self._execute_tool(tool_name, tool_input)
This pattern is particularly elegant for product manager personas using open-ended AI assistants: the agent starts lean, and progressively acquires tools as the conversation reveals what is needed. A PM asking general questions never pays for the database or deployment tool schemas. A PM asking to check a deployment only triggers those tools when that specific need arises.
Tip: Log which capabilities are requested and when. Over time, this data reveals the most common capability sequences, which you can use to pre-load the right tool groups more aggressively via intent classification. The adaptive approach and the intent-based approach are complementary — use adaptive loading in development and during early deployment to collect data, then use that data to build better intent classifiers for production.
Measuring the Impact of Dynamic Tool Loading
After implementing dynamic tool loading, measure the impact rigorously:
import time
from dataclasses import dataclass
@dataclass
class RequestMetrics:
tools_loaded: int
tool_tokens: int
total_input_tokens: int
latency_ms: float
task_completed: bool
def benchmark_loading_strategy(
requests: list[str],
strategy: str # "static" or "dynamic"
) -> list[RequestMetrics]:
metrics = []
for request in requests:
start = time.time()
if strategy == "static":
tools = get_all_tools() # Load everything
else:
tools = get_tools_for_request(request) # Load dynamically
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4096,
tools=tools,
messages=[{"role": "user", "content": request}]
)
latency = (time.time() - start) * 1000
metrics.append(RequestMetrics(
tools_loaded=len(tools),
tool_tokens=count_tool_tokens(tools),
total_input_tokens=response.usage.input_tokens,
latency_ms=latency,
task_completed=response.stop_reason != "max_tokens"
))
return metrics
static_metrics = benchmark_loading_strategy(test_requests, "static")
dynamic_metrics = benchmark_loading_strategy(test_requests, "dynamic")
avg_static_tokens = sum(m.tool_tokens for m in static_metrics) / len(static_metrics)
avg_dynamic_tokens = sum(m.tool_tokens for m in dynamic_metrics) / len(dynamic_metrics)
print(f"Static loading avg tool tokens: {avg_static_tokens:.0f}")
print(f"Dynamic loading avg tool tokens: {avg_dynamic_tokens:.0f}")
print(f"Token reduction: {(1 - avg_dynamic_tokens/avg_static_tokens)*100:.1f}%")
Tip: When benchmarking dynamic tool loading, always verify task completion rates between the static and dynamic strategies. If your dynamic classifier is occasionally missing required tools, the task completion rate will drop. Target at least 98% task completion parity with static loading before shipping. A small accuracy loss in the classifier is acceptable; a measurable drop in task completion is not.
Summary
Dynamic tool loading is the highest-leverage optimization for tool-related token costs. By loading only the tools relevant to the current task, most agents can reduce tool schema overhead by 40–70% without any change to agent behavior. The three main strategies — intent classification, workflow-stage loading, and MCP dynamic filtering — each have their place depending on the agent architecture. For new agents, start with workflow-stage loading (predictable and deterministic). For existing agents with diverse request patterns, add intent-based classification as a preprocessing step. For MCP-heavy deployments, add a filtering layer between the MCP client and the model API.