Hands-On: Optimize MCP | Token Optimization Masterclass

This is a complete, hands-on lab where you will take a realistic but inefficient MCP (Model Context Protocol) server configuration and apply every optimization technique from this module to achieve a measurable, significant reduction in tool-related token overhead.

By the end of this lab, you will have:
- Measured baseline token costs for an MCP server's tool definitions
- Applied schema compression to reduce definition overhead
- Implemented dynamic tool filtering at the MCP client layer
- Added result trimming and structured output to tool handlers
- Configured batching-friendly tooling
- Verified the optimization results with before/after measurements

This lab uses a filesystem and development tools MCP server — the most common MCP server type used in coding assistant and software development agent applications. All personas will find this relevant: software engineers will recognize the tools, QA engineers will see patterns applicable to their test automation agents, and product managers will gain hands-on understanding of what engineering teams can control in AI infrastructure costs.

Setup and Baseline Measurement

Start by setting up the lab environment. You need Node.js 18+ and Python 3.10+ installed.

mkdir mcp-token-optimization-lab
cd mcp-token-optimization-lab

npm init -y
npm install @modelcontextprotocol/sdk zod

pip install anthropic tiktoken

Create the baseline MCP server. This is intentionally verbose and inefficient — your job is to fix it:

// server-baseline.ts
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
  CallToolRequestSchema,
  ListToolsRequestSchema,
} from "@modelcontextprotocol/sdk/types.js";
import * as fs from "fs";
import * as path from "path";
import { execSync } from "child_process";

const server = new Server(
  { name: "dev-tools-baseline", version: "1.0.0" },
  { capabilities: { tools: {} } }
);

// BASELINE: Verbose, unoptimized tool definitions
server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: "read_file_contents",
      description:
        "This tool reads the complete contents of a file from the filesystem and returns the full text content. You can use this tool whenever you need to view what is inside a file, whether it's source code, configuration, documentation, or any other text file. The tool will return the entire file contents as a string.",
      inputSchema: {
        type: "object",
        properties: {
          file_path: {
            type: "string",
            description:
              "The complete absolute or relative path to the file that you want to read. This should include the filename and extension.",
          },
          encoding: {
            type: "string",
            description:
              "The character encoding to use when reading the file. Common values include utf-8 for standard text files, latin-1 for legacy files, and ascii for ASCII-only files. If not specified, utf-8 will be used as the default encoding.",
          },
          include_line_numbers: {
            type: "boolean",
            description:
              "Whether or not to include line numbers in the output. When set to true, each line of the file will be prefixed with its line number. When set to false or not specified, line numbers will not be included.",
          },
        },
        required: ["file_path"],
        additionalProperties: false,
      },
    },
    {
      name: "write_file_contents",
      description:
        "This tool allows you to write content to a file on the filesystem. If the file already exists, it will be completely overwritten with the new content you provide. If the file does not exist, a new file will be created. You can use this tool to save code, configuration, or any other text content to the filesystem.",
      inputSchema: {
        type: "object",
        properties: {
          file_path: {
            type: "string",
            description:
              "The complete absolute or relative path where you want to write the file. This should include the filename and extension.",
          },
          content: {
            type: "string",
            description:
              "The complete text content that you want to write to the file. This will replace any existing content in the file.",
          },
          create_directories: {
            type: "boolean",
            description:
              "If set to true, any parent directories in the file path that do not exist will be automatically created before writing the file. If set to false or not specified, the write will fail if parent directories do not exist.",
          },
          backup_existing: {
            type: "boolean",
            description:
              "If set to true, any existing file at the specified path will be backed up before being overwritten. The backup will be saved with a .bak extension.",
          },
        },
        required: ["file_path", "content"],
        additionalProperties: false,
      },
    },
    {
      name: "list_directory_contents",
      description:
        "This tool retrieves and returns a comprehensive listing of all files and directories contained within a specified directory path. It can optionally search recursively through subdirectories to find all nested files and folders.",
      inputSchema: {
        type: "object",
        properties: {
          directory_path: {
            type: "string",
            description:
              "The complete path to the directory that you want to list the contents of.",
          },
          recursive: {
            type: "boolean",
            description:
              "Whether to recursively list the contents of subdirectories. When true, all nested files and directories will be included in the output.",
          },
          include_hidden: {
            type: "boolean",
            description:
              "Whether to include hidden files and directories (those starting with a dot) in the listing.",
          },
          filter_pattern: {
            type: "string",
            description:
              "An optional glob pattern to filter which files are included in the listing. For example, '*.ts' would only show TypeScript files.",
          },
        },
        required: ["directory_path"],
        additionalProperties: false,
      },
    },
    {
      name: "execute_shell_command",
      description:
        "This tool executes a shell command on the system and returns the complete output including both standard output and standard error. You can use this tool to run scripts, compile code, run test suites, check system status, or perform any other shell operations.",
      inputSchema: {
        type: "object",
        properties: {
          command: {
            type: "string",
            description:
              "The complete shell command string that you want to execute. This can include pipes, redirections, and any valid shell syntax.",
          },
          working_directory: {
            type: "string",
            description:
              "The directory in which to execute the command. If not specified, the command will run in the current working directory of the server process.",
          },
          timeout_seconds: {
            type: "number",
            description:
              "The maximum number of seconds to wait for the command to complete before timing out. If not specified, a default timeout of 30 seconds will be used.",
          },
          environment_variables: {
            type: "object",
            description:
              "An optional object containing additional environment variables to set for the command execution. Each key-value pair represents a variable name and its value.",
            additionalProperties: { type: "string" },
          },
        },
        required: ["command"],
        additionalProperties: false,
      },
    },
    {
      name: "search_in_files",
      description:
        "This tool performs a text search across files in a directory to find files containing a specific search pattern or text string. It returns the file paths, line numbers, and surrounding context for each match found.",
      inputSchema: {
        type: "object",
        properties: {
          search_query: {
            type: "string",
            description:
              "The text string or regular expression pattern to search for within files.",
          },
          search_directory: {
            type: "string",
            description:
              "The directory path to search within. The search will include all files in this directory.",
          },
          file_extension_filter: {
            type: "string",
            description:
              "An optional file extension to limit the search to files of a specific type. For example, '.ts' or '.py'.",
          },
          context_lines: {
            type: "number",
            description:
              "The number of lines of context to include before and after each match. Defaults to 2.",
          },
          case_sensitive: {
            type: "boolean",
            description:
              "Whether the search should be case sensitive. Defaults to false for case-insensitive search.",
          },
          max_results: {
            type: "number",
            description:
              "The maximum number of search results to return. Defaults to 20 results.",
          },
        },
        required: ["search_query", "search_directory"],
        additionalProperties: false,
      },
    },
  ],
}));

Now measure the baseline token cost:

import anthropic
import json

client = anthropic.Anthropic()

baseline_tools = [
    {
        "name": "read_file_contents",
        "description": "This tool reads the complete contents of a file from the filesystem and returns the full text content. You can use this tool whenever you need to view what is inside a file, whether it's source code, configuration, documentation, or any other text file. The tool will return the entire file contents as a string.",
        "input_schema": {
            "type": "object",
            "properties": {
                "file_path": {
                    "type": "string",
                    "description": "The complete absolute or relative path to the file that you want to read. This should include the filename and extension."
                },
                "encoding": {
                    "type": "string",
                    "description": "The character encoding to use when reading the file. Common values include utf-8 for standard text files, latin-1 for legacy files, and ascii for ASCII-only files. If not specified, utf-8 will be used as the default encoding."
                },
                "include_line_numbers": {
                    "type": "boolean",
                    "description": "Whether or not to include line numbers in the output. When set to true, each line of the file will be prefixed with its line number. When set to false or not specified, line numbers will not be included."
                }
            },
            "required": ["file_path"]
        }
    },
    # ... (remaining 4 tools)
]

response = client.messages.count_tokens(
    model="claude-opus-4-5",
    tools=baseline_tools,
    messages=[{"role": "user", "content": "Hello"}]
)

print(f"Baseline tool token cost: {response.input_tokens} tokens (including minimal message)")
print(f"Estimated tool-only overhead: ~{response.input_tokens - 10} tokens")

Expected baseline result: Approximately 850–1,100 tokens for 5 tools. This is the number you will reduce.

Tip: Run the baseline measurement before making any changes and save the number. Every optimization decision you make should be verified against this baseline. This is not just good practice — it is essential for communicating the impact of your optimization work to stakeholders and team members who did not participate in the lab.

Step 1: Compress Tool Schemas

Apply the lean schema principles from Topic 5. Rewrite every tool definition:

// server-optimized.ts — Step 1: Compressed schemas

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: "read_file",
      description: "Read file contents. Returns text with optional line numbers.",
      inputSchema: {
        type: "object",
        properties: {
          path: { type: "string", description: "File path" },
          start_line: { type: "integer", description: "Start line (default 1)" },
          end_line: { type: "integer", description: "End line (default: all)" },
          line_numbers: { type: "boolean", description: "Include line numbers" }
        },
        required: ["path"]
      }
    },
    {
      name: "write_file",
      description: "Write content to file. Creates file if not exists, overwrites if exists.",
      inputSchema: {
        type: "object",
        properties: {
          path: { type: "string", description: "File path" },
          content: { type: "string", description: "Content to write" },
          mkdir: { type: "boolean", description: "Create parent dirs if missing" }
        },
        required: ["path", "content"]
      }
    },
    {
      name: "list_dir",
      description: "List directory contents.",
      inputSchema: {
        type: "object",
        properties: {
          path: { type: "string", description: "Directory path" },
          recursive: { type: "boolean", description: "Include subdirectories" },
          pattern: { type: "string", description: "Glob filter (e.g. *.ts)" },
          hidden: { type: "boolean", description: "Include hidden files" }
        },
        required: ["path"]
      }
    },
    {
      name: "run_command",
      description: "Execute shell command. Returns stdout and stderr.",
      inputSchema: {
        type: "object",
        properties: {
          cmd: { type: "string", description: "Shell command" },
          cwd: { type: "string", description: "Working directory" },
          timeout: { type: "integer", description: "Timeout seconds (default 30)" }
        },
        required: ["cmd"]
      }
    },
    {
      name: "search_files",
      description: "Search files for text/regex pattern. Returns matches with context.",
      inputSchema: {
        type: "object",
        properties: {
          query: { type: "string", description: "Search pattern (regex supported)" },
          dir: { type: "string", description: "Search directory" },
          ext: { type: "string", description: "File extension filter (e.g. .ts)" },
          ctx: { type: "integer", description: "Context lines (default 2)" },
          limit: { type: "integer", description: "Max results (default 20)" }
        },
        required: ["query", "dir"]
      }
    }
  ]
}));

Measure after Step 1:

optimized_step1_tools = [
    {
        "name": "read_file",
        "description": "Read file contents. Returns text with optional line numbers.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {"type": "string", "description": "File path"},
                "start_line": {"type": "integer", "description": "Start line (default 1)"},
                "end_line": {"type": "integer", "description": "End line (default: all)"},
                "line_numbers": {"type": "boolean", "description": "Include line numbers"}
            },
            "required": ["path"]
        }
    },
    # ... (remaining 4 optimized tools)
]

response = client.messages.count_tokens(
    model="claude-opus-4-5",
    tools=optimized_step1_tools,
    messages=[{"role": "user", "content": "Hello"}]
)

print(f"Step 1 (compressed schemas): {response.input_tokens} tokens")

Expected result after Step 1: Approximately 350–450 tokens — a 55–65% reduction from baseline, achieved purely through schema compression.

Tip: When renaming tool parameters from verbose names (file_path, directory_path, search_query) to compact names (path, dir, query), update your tool handler implementations and your system prompt simultaneously. The system prompt may reference the old parameter names in examples. Failing to update the system prompt is the most common source of bugs after schema compression.

Step 2: Add Dynamic Tool Filtering at the Client Layer

Now add an MCP client wrapper that loads only the tools needed for the current request:

import asyncio
import json
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
import anthropic

TOOL_GROUPS = {
    "read": ["read_file", "list_dir", "search_files"],
    "write": ["write_file"],
    "execute": ["run_command"],
    "explore": ["read_file", "list_dir", "search_files"],
    "modify": ["read_file", "write_file", "run_command"]
}

async def get_mcp_tools_filtered(session: ClientSession, group: str) -> list:
    """Fetch MCP tools and filter to the requested group."""
    tools_result = await session.list_tools()

    allowed_names = set(TOOL_GROUPS.get(group, []))

    filtered = [
        {
            "name": tool.name,
            "description": tool.description,
            "input_schema": tool.inputSchema
        }
        for tool in tools_result.tools
        if tool.name in allowed_names
    ]

    print(f"Tool group '{group}': {len(filtered)}/{len(tools_result.tools)} tools loaded")
    return filtered


def classify_task(user_message: str) -> str:
    """Simple keyword-based task classifier (replace with model-based for production)."""
    msg_lower = user_message.lower()

    if any(w in msg_lower for w in ["run", "execute", "test", "build", "compile", "install"]):
        return "execute"
    if any(w in msg_lower for w in ["write", "create", "update", "modify", "fix", "change"]):
        return "modify"
    if any(w in msg_lower for w in ["read", "show", "view", "display", "open"]):
        return "read"
    if any(w in msg_lower for w in ["find", "search", "look for", "where is", "which files"]):
        return "explore"

    return "explore"  # Default to a conservative set


async def run_filtered_agent(user_message: str, server_params: StdioServerParameters):
    """Run agent with dynamically filtered MCP tools."""

    client = anthropic.Anthropic()
    task_group = classify_task(user_message)

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            # Get only the tools we need
            tools = await get_mcp_tools_filtered(session, task_group)

            messages = [{"role": "user", "content": user_message}]

            while True:
                response = client.messages.create(
                    model="claude-opus-4-5",
                    max_tokens=4096,
                    tools=tools,
                    messages=messages
                )

                print(f"Input tokens: {response.usage.input_tokens}")

                if response.stop_reason != "tool_use":
                    return response.content[0].text

                # Handle tool calls
                tool_uses = [b for b in response.content if b.type == "tool_use"]
                tool_results = []

                for tool_use in tool_uses:
                    result = await session.call_tool(tool_use.name, tool_use.input)
                    # Apply result trimming (Step 3)
                    trimmed_result = trim_tool_result(
                        result.content[0].text if result.content else "",
                        max_tokens=500
                    )
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": tool_use.id,
                        "content": trimmed_result
                    })

                messages.extend([
                    {"role": "assistant", "content": response.content},
                    {"role": "user", "content": tool_results}
                ])

Tip: After implementing the classifier, monitor its accuracy by logging the tool group it selects alongside the tools the model actually calls. If the model frequently calls tools not in the selected group, your classifier is under-loading. If selected tools are frequently unused, adjust the group definitions to be more granular.

Step 3: Implement Result Trimming in Tool Handlers

Modify the MCP server's tool call handler to trim large results before returning them:

// In server-optimized.ts — add result trimming to call handler

import { CallToolRequestSchema } from "@modelcontextprotocol/sdk/types.js";

// Token estimation (rough approximation: 1 token ≈ 4 characters for English text)
function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4);
}

function trimToTokenBudget(content: string, maxTokens: number): string {
  const maxChars = maxTokens * 4;
  if (content.length <= maxChars) return content;

  const trimmed = content.substring(0, maxChars);
  const tokenCount = estimateTokens(content);
  return `${trimmed}\n[TRUNCATED: ${tokenCount} estimated tokens → ${maxTokens} token limit. Use start_line/end_line params to paginate.]`;
}

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params;

  try {
    switch (name) {
      case "read_file": {
        const content = fs.readFileSync(args.path as string, "utf-8");
        let lines = content.split("\n");

        // Apply line range filtering
        const startLine = (args.start_line as number || 1) - 1;
        const endLine = args.end_line as number || Math.min(lines.length, startLine + 150);
        lines = lines.slice(startLine, endLine);

        let result = lines
          .map((line, i) => args.line_numbers ? `${startLine + i + 1}: ${line}` : line)
          .join("\n");

        // Add context header
        const header = `[${args.path} lines ${startLine + 1}-${endLine} of ${content.split("\n").length}]\n`;
        result = header + result;

        // Apply token budget
        result = trimToTokenBudget(result, 600);

        return { content: [{ type: "text", text: result }] };
      }

      case "list_dir": {
        const dirPath = args.path as string;
        const pattern = args.pattern as string | undefined;

        // Get directory listing
        let entries = fs.readdirSync(dirPath, { withFileTypes: true });

        // Filter hidden files
        if (!args.hidden) {
          entries = entries.filter(e => !e.name.startsWith("."));
        }

        // Apply pattern filter
        if (pattern) {
          const regex = new RegExp(
            pattern.replace(/\./g, "\\.").replace(/\*/g, ".*")
          );
          entries = entries.filter(e => regex.test(e.name));
        }

        // Return compact JSON, not verbose text
        const result = JSON.stringify({
          path: dirPath,
          count: entries.length,
          entries: entries.map(e => ({
            name: e.name,
            type: e.isDirectory() ? "dir" : "file"
          }))
        });

        return { content: [{ type: "text", text: trimToTokenBudget(result, 400) }] };
      }

      case "run_command": {
        const timeout = (args.timeout as number || 30) * 1000;

        let stdout = "";
        let stderr = "";
        let exitCode = 0;

        try {
          stdout = execSync(args.cmd as string, {
            cwd: args.cwd as string || process.cwd(),
            timeout,
            encoding: "utf-8"
          });
        } catch (e: any) {
          stderr = e.stderr || e.message;
          exitCode = e.status || 1;
        }

        // Compact structured result
        const result = JSON.stringify({
          exit_code: exitCode,
          stdout: stdout.slice(0, 1500),  // Cap stdout
          stderr: stderr.slice(0, 500),   // Cap stderr — usually less needed
          truncated: stdout.length > 1500
        });

        return { content: [{ type: "text", text: result }] };
      }

      case "search_files": {
        const { query, dir, ext, ctx = 2, limit = 10 } = args as any;

        // Use grep-style search
        const grepCmd = `grep -r ${ext ? `--include="*${ext}"` : ""} -n -i "${query}" "${dir}" | head -${limit * (ctx * 2 + 1)}`;

        let searchOutput = "";
        try {
          searchOutput = execSync(grepCmd, { encoding: "utf-8", timeout: 10000 });
        } catch (e: any) {
          searchOutput = e.stdout || "";
        }

        // Parse and structure results
        const matches = searchOutput
          .split("\n")
          .filter(line => line.includes(":"))
          .slice(0, limit)
          .map(line => {
            const [file, lineNum, ...rest] = line.split(":");
            return { file, line: parseInt(lineNum), match: rest.join(":").trim() };
          });

        const result = JSON.stringify({
          query,
          count: matches.length,
          results: matches
        });

        return { content: [{ type: "text", text: trimToTokenBudget(result, 500) }] };
      }

      default:
        throw new Error(`Unknown tool: ${name}`);
    }
  } catch (error: any) {
    // Compact error format
    const errorResult = JSON.stringify({
      error: error.code || "TOOL_ERROR",
      msg: error.message?.slice(0, 150) || "Unknown error"
    });
    return { content: [{ type: "text", text: errorResult }], isError: true };
  }
});

Tip: The read_file pagination with start_line/end_line parameters is one of the highest-impact optimizations in this lab. Without it, agents reading large files load the entire file into context. With it, agents can load only the section they need. Encourage the model to use these parameters by adding a system prompt instruction: "When reading files, use start_line and end_line to read only the relevant section. Read the first 50 lines to understand structure, then target specific line ranges."

Step 4: Configure the MCP Client for Batching

Add batching support to the client configuration:


async def execute_tool_calls_parallel(
    session: ClientSession,
    tool_uses: list
) -> list:
    """Execute multiple tool calls concurrently."""

    async def execute_single(tool_use):
        result = await session.call_tool(tool_use["name"], tool_use["input"])
        raw_text = result.content[0].text if result.content else ""
        return {
            "type": "tool_result",
            "tool_use_id": tool_use["id"],
            "content": trim_tool_result(raw_text, max_tokens=500)
        }

    # Execute all tool calls concurrently
    tasks = [execute_single(tu) for tu in tool_uses]
    results = await asyncio.gather(*tasks, return_exceptions=True)

    # Handle any exceptions
    processed = []
    for i, result in enumerate(results):
        if isinstance(result, Exception):
            processed.append({
                "type": "tool_result",
                "tool_use_id": tool_uses[i]["id"],
                "content": json.dumps({"error": "EXECUTION_FAILED", "msg": str(result)[:100]})
            })
        else:
            processed.append(result)

    return processed


BATCHING_SYSTEM_PROMPT = """You are a development assistant with access to filesystem and shell tools.

IMPORTANT: When you need multiple pieces of information:
- Request ALL independent information in a single response
- Do not wait for one result before requesting another if they are independent
- Example: To understand a codebase, request directory listing AND search for specific patterns AND read key config files ALL at once

When reading files, use start_line and end_line parameters to read only relevant sections.
When listing directories, use the pattern parameter to filter to relevant file types.
Always prefer structured data retrieval over reading large amounts of unneeded content."""

Step 5: Measure the Final Optimized Configuration

Now run a comprehensive before/after comparison:

import anthropic
import json
import time

client = anthropic.Anthropic()

def count_tool_tokens(tools: list) -> int:
    """Use Anthropic's token counting API to measure tool overhead precisely."""
    response = client.messages.count_tokens(
        model="claude-opus-4-5",
        tools=tools,
        messages=[{"role": "user", "content": "Hello"}]
    )
    return response.input_tokens - 10  # Subtract minimal message tokens

TEST_TASKS = [
    "Find all TypeScript files in the src directory and show me which ones import React",
    "Read the package.json file and show me the dependencies",
    "Run the test suite and show me any failures",
    "Search for all uses of the deprecated 'componentWillMount' lifecycle method",
    "List the contents of the src/components directory"
]

def simulate_session_tokens(
    tools: list,
    tasks: list,
    avg_result_tokens: int,
    avg_turns_per_task: int
) -> dict:
    """Estimate total token cost for a session."""
    tool_overhead_per_request = count_tool_tokens(tools)
    system_prompt_tokens = 200  # Base system prompt

    total_tool_overhead = 0
    total_result_tokens = 0
    total_requests = 0

    for task in tasks:
        task_tokens = len(task.split()) * 1.3  # Rough estimate
        turns = avg_turns_per_task

        for turn in range(turns):
            # Context grows with each turn
            context_size = task_tokens + (turn * avg_result_tokens)
            request_input = system_prompt_tokens + tool_overhead_per_request + context_size
            total_tool_overhead += tool_overhead_per_request
            total_result_tokens += avg_result_tokens
            total_requests += 1

    return {
        "tool_overhead_per_request": tool_overhead_per_request,
        "total_requests": total_requests,
        "total_tool_overhead": total_tool_overhead,
        "total_result_tokens": total_result_tokens,
        "tool_overhead_percentage": total_tool_overhead / (total_tool_overhead + total_result_tokens) * 100
    }


baseline_stats = simulate_session_tokens(
    tools=baseline_tools,  # From earlier measurement
    tasks=TEST_TASKS,
    avg_result_tokens=800,  # Unfiltered results are large
    avg_turns_per_task=4    # Sequential tool calls = more turns
)

optimized_stats = simulate_session_tokens(
    tools=optimized_tools,  # Compressed schemas
    tasks=TEST_TASKS,
    avg_result_tokens=250,  # Trimmed and filtered results
    avg_turns_per_task=2    # Batching reduces turns
)

print("=" * 60)
print("OPTIMIZATION RESULTS")
print("=" * 60)
print(f"\nTool schema tokens per request:")
print(f"  Baseline:  {baseline_stats['tool_overhead_per_request']} tokens")
print(f"  Optimized: {optimized_stats['tool_overhead_per_request']} tokens")
schema_reduction = (1 - optimized_stats['tool_overhead_per_request'] / baseline_stats['tool_overhead_per_request']) * 100
print(f"  Reduction: {schema_reduction:.1f}%")

print(f"\nTotal tool overhead for {len(TEST_TASKS)} tasks:")
print(f"  Baseline:  {baseline_stats['total_tool_overhead']:,} tokens")
print(f"  Optimized: {optimized_stats['total_tool_overhead']:,} tokens")
total_reduction = (1 - optimized_stats['total_tool_overhead'] / baseline_stats['total_tool_overhead']) * 100
print(f"  Reduction: {total_reduction:.1f}%")

print(f"\nAPI requests (turns) for {len(TEST_TASKS)} tasks:")
print(f"  Baseline:  {baseline_stats['total_requests']} requests")
print(f"  Optimized: {optimized_stats['total_requests']} requests")

print(f"\nEstimated cost reduction at $15/M input tokens:")
baseline_cost = (baseline_stats['total_tool_overhead'] + baseline_stats['total_result_tokens']) / 1_000_000 * 15
optimized_cost = (optimized_stats['total_tool_overhead'] + optimized_stats['total_result_tokens']) / 1_000_000 * 15
print(f"  Baseline:  ${baseline_cost:.4f}")
print(f"  Optimized: ${optimized_cost:.4f}")
print(f"  Savings:   {(1 - optimized_cost/baseline_cost)*100:.1f}%")

Expected results from this lab:

Metric	Baseline	Optimized	Reduction
Schema tokens/request	~950	~380	60%
Avg result tokens	~800	~250	69%
API turns per 5 tasks	20	10	50%
Total tool-related tokens	~35,000	~12,500	64%

Tip: After completing this lab, the most important next step is to establish these measurements as automated tests in your CI/CD pipeline. Create a test that runs count_tokens on your current tool definitions after any PR that touches tool schemas. Gate merges on the token count staying below a defined threshold. This prevents the gradual "schema inflation" that commonly undoes optimization work over the months following an optimization sprint.

Extending the Lab: Additional Optimization Challenges

Once you have completed the core lab, try these additional challenges:

Challenge 1: Add a metadata endpoint

Add a list_tools_by_group capability to your MCP server that returns tool definitions organized by group. This allows clients to request only the tool group they need without loading everything.

Challenge 2: Implement response streaming with early termination

For the run_command tool, implement streaming output that allows the agent to stop reading when it has found what it needs, rather than waiting for the full command output.

Challenge 3: Build a tool usage analytics dashboard

Instrument your MCP server to log which tools are called, with what parameters, and how large the results are. Build a simple Python script that reads these logs and produces a "tool efficiency report" showing the cost and usage frequency of each tool.

Challenge 4: Compare MCP server implementations

Test the same agent tasks using three different MCP filesystem server implementations: your optimized server, the reference @modelcontextprotocol/server-filesystem package, and a minimal hand-rolled implementation. Measure and compare token costs across all three.

Tip: These challenges are designed to be attempted in order of increasing difficulty. Challenge 1 is appropriate for all personas in this course. Challenge 4 requires deeper familiarity with the MCP protocol and TypeScript. If you complete Challenge 4, you will have a strong empirical basis for tool selection decisions that most teams make purely on intuition.

Summary

This hands-on lab applied every optimization technique from Module 7 to a realistic MCP server scenario:

Schema compression (Topic 1 and 5): Reduced tool definition tokens by ~60% through concise descriptions, flat parameters, and removed unnecessary validation prose.
Dynamic filtering (Topic 2): Cut per-request tool overhead by 50–70% through task-based tool group loading.
Result trimming (Topic 3): Reduced average result size by ~70% through hard token budgets, line range pagination, and compact JSON serialization.
Parallel execution (Topic 4): Halved the number of API round-trips through concurrent tool execution and a batching-encouraging system prompt.

The cumulative effect of all four optimizations together — roughly 64–70% total token reduction — is significantly greater than any individual technique. Production MCP deployments that implement all four layers routinely achieve 50–75% reductions in tool-related token costs, translating directly to lower API bills, longer sustainable context windows, and faster agent response times.