Designing Custom Tools | Token Optimization Masterclass

When you define a custom tool for your AI agent, you are making a long-term investment. The design choices you make — what parameters the tool accepts, how it returns data, how it handles errors, what its schema says — determine the token cost of every request that uses that tool for the lifetime of the agent. A poorly designed tool is like a slow database query: it works, but it creates compounding inefficiency that grows more painful as the system scales.

This topic teaches you to design custom tools from a token-efficiency perspective: how to write schemas that convey maximum information in minimum tokens, how to structure outputs so the model gets exactly what it needs, and how to compose tools so the agent can do more with fewer calls.

The principles here apply across all tool formats: OpenAI function definitions, Anthropic tool schemas, LangChain tools, MCP server tool definitions, and any custom tool format built on JSON Schema.

Principle 1: Schema Design — Lean by Default

The tool schema is the contract between your tool and the model. It tells the model when to use the tool, what inputs to provide, and what to expect back. Every word in your schema is a token. Design your schema to convey the maximum decision-relevant information per token.

The five-word description test

Write your tool's description as if you have only five words. Then expand only where ambiguity would cause the model to call the wrong tool or provide wrong inputs. Most descriptions do not need more than 15 words.

// Fails the test — 34 words, mostly redundant
{
  "name": "search_documentation",
  "description": "This tool allows you to search through the technical documentation to find relevant information. You can use it to look up API references, usage guides, and conceptual explanations."
}

// Passes the test — 11 words, precise
{
  "name": "search_documentation",
  "description": "Search technical docs for API references, guides, and concepts."
}

Parameter descriptions: label, not tutorial

Parameter descriptions should tell the model what the value represents and its constraints — not how to use the tool or what will happen as a result.

// Bloated — 35 tokens for one parameter description
"query": {
  "type": "string",
  "description": "The search query string that will be used to search through the documentation. This should be a concise phrase or term that describes what you are looking for."
}

// Lean — 7 tokens
"query": {
  "type": "string",
  "description": "Search term or phrase"
}

Avoid validation in descriptions

Validation constraints belong in your tool implementation, not in the schema description. Do not write "Must be a positive integer between 1 and 100" when the model sees "type": "integer" — it understands the integer type. Use description only for semantic meaning, not data type constraints.

// Add actual JSON Schema constraints, not prose descriptions of them
"limit": {
  "type": "integer",
  "minimum": 1,
  "maximum": 50,
  "description": "Max results (default 10)"
}
// Better than: "description": "The maximum number of results to return. Must be between 1 and 50. Defaults to 10 if not specified."

Note: Even minimum/maximum constraints add tokens. Only include them if the model genuinely needs to know the range to make correct calls. If your implementation silently clips out-of-range values, skip the constraint in the schema.

Tip: Apply a token budget to each tool schema: tool name (3–5 tokens) + description (10–20 tokens) + parameter names and types (5–10 tokens per parameter) + parameter descriptions (5–10 tokens per parameter). A 3-parameter tool should cost 60–90 tokens total. If you are spending more than 150 tokens on a tool schema, audit every description for compression opportunities.

Principle 2: Parameter Design — Flat, Typed, Minimal

The structure of your parameters affects both token cost (in the schema) and token cost (in the model's tool call output). Complex nested inputs produce complex serialized tool calls, which appear in the conversation history and consume tokens on every subsequent turn.

Prefer flat parameters over nested objects

// Nested — schema is larger, model output is larger
{
  "name": "search_issues",
  "input_schema": {
    "type": "object",
    "properties": {
      "filters": {
        "type": "object",
        "properties": {
          "status": {"type": "string"},
          "assignee": {"type": "string"},
          "priority": {"type": "string"}
        }
      },
      "pagination": {
        "type": "object",
        "properties": {
          "page": {"type": "integer"},
          "per_page": {"type": "integer"}
        }
      }
    }
  }
}

// Flat — smaller schema, smaller model output
{
  "name": "search_issues",
  "input_schema": {
    "type": "object",
    "properties": {
      "status": {"type": "string", "enum": ["open", "closed"], "description": "Filter by status"},
      "assignee": {"type": "string", "description": "Assignee username"},
      "priority": {"type": "string", "enum": ["low", "medium", "high"]},
      "page": {"type": "integer", "description": "Page number (default 1)"},
      "per_page": {"type": "integer", "description": "Results per page (default 20)"}
    }
  }
}

The flat version is smaller in the schema and produces smaller model calls because {"status": "open", "page": 1} is more compact than {"filters": {"status": "open"}, "pagination": {"page": 1}}.

Use enums aggressively for categorical values

Enums serve two purposes: they tell the model exactly what values are valid (reducing hallucination risk), and they allow the model to write shorter, more precise tool calls (it picks from a known list rather than generating freeform text that might be wrong).

// Without enum — model must generate valid values from context
"sort_by": {"type": "string", "description": "Field to sort by: created_at, updated_at, priority, title"}

// With enum — precise, self-validating, guides model behavior
"sort_by": {
  "type": "string",
  "enum": ["created_at", "updated_at", "priority", "title"],
  "description": "Sort field"
}

The enum in the schema adds tokens (the list of values), but it reduces the risk of the model inventing invalid sort fields and reduces the description length simultaneously. For 3–5 values, enums are almost always net token-positive.

Make parameters optional with sensible defaults

Every required parameter forces the model to include it in every tool call. Optional parameters with good defaults allow the model to omit them when the default suffices, producing smaller tool calls.

// Every parameter required — every call is verbose
"required": ["query", "limit", "page", "include_archived", "sort_by", "sort_order"]

// Most have defaults — most calls will be compact
"required": ["query"]
// limit defaults to 20, page defaults to 1, include_archived defaults to false, etc.

Tip: Review every parameter in your tool schemas and ask: "Would the model call this correctly 80% of the time without explicitly specifying this parameter?" If yes, make it optional with a sensible default. The goal is a tool where the model can call it as tool_name(query="search term") for the common case and only specifies additional parameters when deviating from defaults.

Principle 3: Output Design — Return What the Model Needs, Nothing More

Tool output design is the other side of the equation. The model receives your tool's output and adds it to the conversation context. Every field your tool returns that the model does not use is a waste.

Design output schemas alongside input schemas

Before implementing a tool, write down:
1. What task will the model be doing when it calls this tool?
2. What specific pieces of information does the model need from this tool to complete that task?
3. What is the most compact representation of that information?

This exercise prevents the common mistake of returning a "full object" when only a few fields are needed.



def get_issue_naive(issue_number: int) -> dict:
    response = github.get(f"/issues/{issue_number}")
    return response.json()  # 150+ fields, many nested objects

def get_issue_optimized(issue_number: int) -> dict:
    response = github.get(f"/issues/{issue_number}")
    data = response.json()

    return {
        "number": data["number"],
        "title": data["title"],
        "state": data["state"],
        "body": data["body"][:1000] if data["body"] else "",  # Cap body length
        "labels": [l["name"] for l in data["labels"]],
        "assignees": [a["login"] for a in data["assignees"]],
        "created_at": data["created_at"][:10]  # Date only, not full timestamp
    }

Return compact JSON, not pretty-printed JSON

This is a simple but meaningful optimization: use separators=(',', ':') in Python's json.dumps() to produce compact JSON without spaces. For a 50-field result object, this typically saves 30–50 tokens.

import json

json.dumps({"key": "value", "count": 42}, indent=2)

json.dumps({"key": "value", "count": 42}, separators=(',', ':'))

For a tool that returns a list of 10 items, this difference compounds significantly.

Use abbreviated field names for high-frequency tools

For tools that are called many times per session and whose output appears in the conversation history, consider using abbreviated field names. The model handles abbreviated JSON keys correctly.

{
    "test_name": "test_checkout_flow",
    "status": "failed",
    "duration_ms": 234,
    "error_message": "AssertionError: expected 200, got 404"
}

{
    "name": "test_checkout_flow",
    "status": "failed",
    "dur": 234,
    "err": "AssertionError: expected 200, got 404"
}

This is most appropriate for internal tools where you control both the schema and the system prompt. Document your abbreviations in the system prompt so the model knows how to interpret them.

Tip: Build an "output schema" for each of your custom tools as a Pydantic model or TypeScript type. This enforces discipline: the output type makes explicit what the tool returns, makes it easy to review for field bloat, and can be used to automatically serialize results to compact JSON. A tool without a defined output schema tends to accrete fields over time as developers add "just one more useful field" without considering the token impact.

Principle 4: Tool Granularity — Composite vs. Atomic Tools

One of the most impactful design decisions is the granularity of your tools: do you build many atomic tools (one function per operation) or fewer composite tools (one function for a common multi-step operation)?

The case for composite tools

Atomic tools require more round-trips. A composite tool can perform multiple operations in a single call, returning a combined result that the model would otherwise need 2–4 separate tool calls to assemble.


def analyze_file(path: str) -> dict:
    """Get file content with git blame and recent change history in one call."""
    content = read_file(path)
    blame = get_git_blame(path)
    recent_changes = get_recent_git_changes(path, n=5)

    return {
        "path": path,
        "content": content[:2000],  # First 2000 chars
        "blame": summarize_blame(blame),
        "recent_changes": [
            {"hash": c["hash"][:8], "msg": c["message"][:80], "date": c["date"][:10]}
            for c in recent_changes
        ]
    }

The composite analyze_file tool replaces 3 round-trips with 1, while also allowing you to apply summarization and filtering internally before returning data to the model.

The case for atomic tools

Composite tools are only efficient when the model frequently needs all the combined data. If the model needs file content 90% of the time but git blame only 20% of the time, a composite tool wastes effort in 80% of calls.

Use composite tools when:
- The operations are almost always needed together
- The combined result is still compact enough to not overwhelm the context
- The composite operation is meaningful as a named action

Use atomic tools when:
- Operations are needed independently most of the time
- The combined result would be very large
- The task is exploratory and the model needs to decide what to do next

The "workflow shortcut" pattern

A practical middle ground: keep your atomic tools, but add high-level composite tools as shortcuts for common multi-step patterns.

// Atomic tools (always available)
["read_file", "git_blame", "git_log", "run_tests", "get_coverage"]

// Composite shortcuts (available for common workflows)
{
  "name": "get_change_context",
  "description": "Get a file's content, recent git history, and test coverage in one call. Use instead of separate read_file + git_log + get_coverage calls.",
  "input_schema": {
    "type": "object",
    "properties": {
      "path": {"type": "string", "description": "File path"},
      "history_depth": {"type": "integer", "description": "Git commits to include (default 3)"}
    },
    "required": ["path"]
  }
}

The system prompt explicitly tells the model when to use the composite tool: "Use get_change_context instead of separate read_file, git_log, and get_coverage calls when you need to understand a file's context."

Tip: Instrument your agent to track which combinations of tools are most frequently called within the same session. Any combination that appears in more than 30% of sessions is a candidate for a composite tool. Build the composite, A/B test it against the atomic equivalent by measuring total tokens per session, and ship if the composite version wins.

Principle 5: Error Handling Design — Compact, Actionable Error Responses

Tool errors are a hidden source of token waste. When a tool fails, naive implementations return verbose error messages with stack traces, configuration details, and diagnostic information that the model cannot use. The model needs to know: what went wrong, and what can be done about it.

Design error responses with the model in mind:

from enum import Enum
from dataclasses import dataclass

class ToolErrorCode(str, Enum):
    NOT_FOUND = "NOT_FOUND"
    PERMISSION_DENIED = "PERMISSION_DENIED"
    INVALID_INPUT = "INVALID_INPUT"
    RATE_LIMITED = "RATE_LIMITED"
    TIMEOUT = "TIMEOUT"
    DEPENDENCY_ERROR = "DEPENDENCY_ERROR"

@dataclass
class ToolError:
    code: ToolErrorCode
    message: str
    suggestion: str = ""  # What the model should try instead
    retry_after_seconds: int = 0  # For rate limits

    def to_compact_json(self) -> str:
        result = {"error": self.code, "msg": self.message}
        if self.suggestion:
            result["try"] = self.suggestion
        if self.retry_after_seconds:
            result["retry_after"] = self.retry_after_seconds
        return json.dumps(result, separators=(',', ':'))



"""Error: FileNotFoundError: [Errno 2] No such file or directory: '/src/main.py'
Traceback (most recent call last):
  File "/agent/tools/file_reader.py", line 24, in execute
    with open(path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/src/main.py'"""

'{"error":"NOT_FOUND","msg":"File not found: /src/main.py","try":"Use list_directory to find the correct path"}'

For QA engineers, test execution error output is particularly important to design carefully. A failed test run might generate hundreds of lines of stack traces, but the model needs only the failure summary.

def format_test_error(exception: Exception, test_name: str) -> str:
    """Format test execution errors for minimal token consumption."""
    error_type = type(exception).__name__
    error_msg = str(exception)[:200]  # Cap error message length

    return json.dumps({
        "error": "TEST_EXECUTION_FAILED",
        "test": test_name,
        "type": error_type,
        "msg": error_msg,
        "try": "Check the test file for import errors or configuration issues"
    }, separators=(',', ':'))

Tip: Review your tool error messages with the question: "Does the model need this information to decide what to do next?" If the answer is no, remove it. The most useful error responses include: what type of error occurred (as an enum code), a one-sentence human-readable message, and a concrete suggestion for what the model should try instead. Everything else — stack traces, debug logs, internal system state — is noise that costs tokens.

Practical Tool Design Checklist

Use this checklist when designing or reviewing custom tools:

Schema
- [ ] Tool description is 10–20 words and starts with a verb
- [ ] Each parameter description is 5–10 words maximum
- [ ] Validation constraints are in the implementation, not the description
- [ ] No nested parameter objects (prefer flat structure)
- [ ] Categorical parameters use enums
- [ ] Only truly required parameters are in required

Output
- [ ] An output schema (Pydantic/TypeScript) is defined before implementation
- [ ] Output contains only fields the model will use for its next reasoning step
- [ ] JSON is serialized compactly (no extra whitespace)
- [ ] Collections are limited by default (max 10–20 items)
- [ ] Large string fields (file content, descriptions) are capped at a defined character/token limit

Error handling
- [ ] Errors use a defined enum of error codes
- [ ] Error messages are one sentence
- [ ] Errors include a "try this instead" suggestion
- [ ] No stack traces in error output

Granularity
- [ ] The tool does one thing well (or explicitly combines a documented common pattern)
- [ ] The tool name is a verb + noun (search_files, not files_tool)
- [ ] The tool can be called in parallel with other tools for independent operations

Tip: Add tool design review to your pull request checklist. Any PR that adds or modifies a tool schema should include a token count for the schema and a sample output with a token count. This makes tool token costs visible in the same place that code quality is reviewed, embedding efficiency discipline into the development workflow.

Summary

Custom tool design is where software engineers have the most direct control over agent token efficiency. Schema verbosity, parameter structure, output field selection, tool granularity, and error message design all contribute to the per-request token cost of every tool in your agent's arsenal. A well-designed tool costs 60–90 tokens to define and returns 100–300 tokens per call. A poorly designed tool costs 200–400 tokens to define and returns 1,000–5,000 tokens per call — both of which compound across every round-trip in every session. The five principles in this topic — lean schemas, flat parameters, output-first design, composite shortcuts, and compact errors — give you a systematic framework for building tools that are both powerful and token-efficient.