Code Diff Summarization | Token Optimization Masterclass

Code context is the densest, most token-hungry category of input in engineering-focused agentic sessions. A single moderately complex file can consume 5K-15K tokens. A multi-file refactor can push 50K-100K tokens of diff context into the conversation before any discussion occurs. Unlike conversational text, code has a specific structure — functions, classes, modules, dependencies — that demands specialized summarization strategies distinct from what works for chat logs.

This topic covers how to produce compact, high-fidelity representations of code files, diffs, and change sets that give AI agents the structural understanding they need without paying the full token cost of raw source.

Why Code Summarization Requires a Different Approach

Conversational summarization can afford to lose some nuance because human language is redundant — meaning survives light compression. Code is not redundant. A single changed variable name, a missing await, or an altered type signature can represent a critical semantic change. At the same time, much of what makes code verbose — boilerplate, imports, comments, whitespace, documentation strings — carries little information for a reasoning agent that needs to understand what changed and why.

The goal of code summarization is not lossy compression of semantics. It is structured extraction of semantics from syntactic noise. The techniques below preserve behavior, interface, and decision information while stripping the tokens that an agent does not need.

Tip: Think of code summarization as producing an architectural briefing, not a compressed source file. The agent does not need to regenerate every line from the summary — it needs to understand structure, interfaces, and the rationale for changes well enough to make correct decisions about what to do next.

File-Level Summarization: The Code Skeleton

For large source files that need to be in context, the most effective technique is to replace the full file with a "code skeleton" — a compact representation that preserves interface signatures, class hierarchies, and key logic structure while removing implementation bodies.

Example: Full TypeScript service file (~180 lines, ~2,200 tokens)

// Full implementation — too many tokens for large codebases
import { Injectable } from '@nestjs/common';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { User } from './user.entity';
import { CreateUserDto } from './dto/create-user.dto';
import { UpdateUserDto } from './dto/update-user.dto';
import { HashService } from '../auth/hash.service';
import { EmailService } from '../email/email.service';

@Injectable()
export class UsersService {
  constructor(
    @InjectRepository(User)
    private usersRepository: Repository<User>,
    private hashService: HashService,
    private emailService: EmailService,
  ) {}

  async create(createUserDto: CreateUserDto): Promise<User> {
    const existingUser = await this.usersRepository.findOne({
      where: { email: createUserDto.email },
    });
    if (existingUser) {
      throw new ConflictException('Email already registered');
    }
    const hashedPassword = await this.hashService.hash(createUserDto.password);
    const user = this.usersRepository.create({
      ...createUserDto,
      password: hashedPassword,
    });
    const saved = await this.usersRepository.save(user);
    await this.emailService.sendWelcomeEmail(saved.email, saved.firstName);
    return saved;
  }

  // ... 150 more lines
}

Code skeleton (~35 lines, ~380 tokens):

// src/users/users.service.ts — SKELETON SUMMARY
// Dependencies: User entity, CreateUserDto, UpdateUserDto, HashService, EmailService

@Injectable()
export class UsersService {
  // Constructor: usersRepository (Repository<User>), hashService, emailService

  async create(dto: CreateUserDto): Promise<User>
  // Checks for duplicate email (throws ConflictException)
  // Hashes password via hashService
  // Saves user, sends welcome email via emailService

  async findAll(): Promise<User[]>
  // Returns all users, excludes password field

  async findOne(id: string): Promise<User>
  // Throws NotFoundException if not found

  async update(id: string, dto: UpdateUserDto): Promise<User>
  // Partial update, re-hashes password if changed

  async remove(id: string): Promise<void>
  // Soft delete (sets deletedAt timestamp)

  private async validateUniqueEmail(email: string, excludeId?: string): Promise<void>
  // Throws ConflictException if email taken by another user
}

The skeleton is 83% smaller and retains every interface, dependency, and behavioral contract the agent needs to make correct decisions about how this service interacts with the rest of the codebase.

Prompt to generate a skeleton:

Produce a code skeleton for the following file. Include:
- All class/function/method signatures with parameter types and return types
- One-line comments per method describing behavior (not implementation)
- Constructor dependencies
- Key thrown exceptions
- Any non-obvious side effects (external calls, state mutations)

Remove: all implementation bodies, import statements (summarize as a comment), docstrings, inline comments, blank lines.

Target: maximum 25% of original token count.

Tip: Generate code skeletons as a pre-processing step before injecting files into context, not during the session. Build a CLI tool or pre-commit hook that produces a .skeleton.ts alongside each source file. Then your agent orchestrator can inject the skeleton by default and only fetch the full source when a specific function needs to be read or modified.

Diff Summarization: From Raw Git Diff to Semantic Change Log

Raw git diffs are extraordinarily token-inefficient. A diff for a straightforward refactor might run to 800 lines and 10K tokens, while the semantic content — "extracted X logic from A into B, updated all callers" — can be expressed in 50 tokens. For agentic code review, debugging, and refactoring sessions, diff summarization is one of the highest-value compression techniques available.

Raw diff (excerpt, ~400 tokens):

diff --git a/src/orders/orders.service.ts b/src/orders/orders.service.ts
index 3a2f91c..8b4d2e1 100644
--- a/src/orders/orders.service.ts
+++ b/src/orders/orders.service.ts
@@ -45,28 +45,8 @@ export class OrdersService {
     const items = await this.validateOrderItems(createOrderDto.items);
     const subtotal = items.reduce((sum, item) => sum + item.price * item.quantity, 0);
-    const tax = subtotal * 0.08;
-    const shipping = createOrderDto.express ? 24.99 : 5.99;
-    const discount = createOrderDto.couponCode
-      ? await this.couponsService.getDiscount(createOrderDto.couponCode, subtotal)
-      : 0;
-    const total = subtotal + tax + shipping - discount;
+    const { tax, shipping, discount, total } = await this.pricingService.calculateTotal({
+      subtotal,
+      express: createOrderDto.express,
+      couponCode: createOrderDto.couponCode,
+    });

Diff summary (~60 tokens):

orders.service.ts: Extracted pricing calculation logic (tax, shipping, discount, total) 
into new PricingService.calculateTotal() method. OrdersService now delegates to 
PricingService instead of computing inline. Breaking change: PricingService is now 
a required dependency.

Prompt to summarize a diff:

Summarize this git diff as a semantic change log. For each changed file:
1. One sentence: what changed (not how — what and why)
2. Any new dependencies introduced
3. Any interface/API changes (function signatures, exported types)
4. Any breaking changes (callers that need to be updated)
5. Any side effects on behavior (changed error conditions, altered defaults)

Ignore: whitespace changes, comment updates, reformatting that doesn't change behavior.
Format each file as a single bullet point with sub-bullets for items 2-5 if present.
Target: 1-3 sentences per file changed.

Tip: For multi-file PRs, ask the model to produce two diff summaries: a "per-file" summary (detailed, for reviewers) and an "executive summary" (2-3 sentences describing the overall change). The executive summary goes in the rolling session context; the per-file summary is stored as a reference artifact. This gives both levels of detail without paying the full token cost of the raw diff.

Change Set Context: Summarizing an Entire PR or Feature Branch

When an agent needs to reason about a full feature branch rather than a single commit, you need change set summarization — a higher-level representation of the cumulative changes across multiple files and commits.

Change set summary structure:

## Change Set Summary: [Feature Name / PR Title]

### Change Intent
[One sentence: what this change set accomplishes]

### Scope of Changes
- Files modified: [count] ([list of primary files affected])
- New files: [count] ([list])
- Deleted files: [count] ([list])
- Net lines: +[added] / -[removed]

### Architectural Changes
- [Any new modules, services, or abstractions introduced]
- [Any existing abstractions modified or removed]
- [New external dependencies]

### Interface Changes (Breaking / Non-Breaking)
- [List public API changes, clearly marked BREAKING or NON-BREAKING]

### Key Implementation Decisions
- [Decision 1 and brief rationale]
- [Decision 2 and brief rationale]

### Behavioral Changes
- [What the system does differently after this change]
- [Edge cases or error conditions that changed]

### Test Coverage
- New tests: [count, brief description]
- Modified tests: [count]
- Coverage delta: [+X% or noted as not measured]

This structure takes the agent from "here is 10,000 tokens of diff" to "here is a 300-token briefing that captures everything architecturally relevant."

Programmatic generation using git and Claude:

import subprocess
from anthropic import Anthropic

client = Anthropic()

def get_branch_diff(base_branch: str = "main") -> str:
    """Get the full diff for the current branch vs base."""
    result = subprocess.run(
        ["git", "diff", f"{base_branch}...HEAD"],
        capture_output=True, text=True
    )
    return result.stdout

def get_commit_log(base_branch: str = "main") -> str:
    """Get commit messages for the current branch."""
    result = subprocess.run(
        ["git", "log", f"{base_branch}...HEAD", "--oneline"],
        capture_output=True, text=True
    )
    return result.stdout

def summarize_change_set(base_branch: str = "main") -> str:
    diff = get_branch_diff(base_branch)
    commits = get_commit_log(base_branch)

    # Truncate diff if very large — take first 50K chars
    if len(diff) > 50000:
        diff = diff[:50000] + "\n\n[DIFF TRUNCATED — showing first 50K characters]"

    prompt = f"""Commit log:
{commits}

Git diff:
{diff}

Produce a structured change set summary using this format:
[paste your change set summary format here]

Be precise about interface changes — these are the most important items for code review and integration work."""

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1000,
        messages=[{"role": "user", "content": prompt}]
    )

    return response.content[0].text

Tip: Run summarize_change_set() as part of your CI pipeline and post the summary as a PR description. This gives both human reviewers and any AI agents doing automated review a high-quality context briefing before they look at a single line of diff. It doubles as documentation.

Test Output Summarization: From 10,000 Lines of CI Log to a Test Status Briefing

Test output is perhaps the most token-wasteful artifact in engineering workflows. A full Jest, pytest, or Go test run can produce thousands of lines for a handful of actual failures. For QA engineers and engineering teams running test agents, summarizing test output is as important as summarizing code diffs.

Raw test output (problematic):

FAIL src/payments/payments.service.spec.ts
  ● PaymentsService › processPayment › should handle Stripe timeout

    Timeout - Async callback was not invoked within the 5000ms timeout specified by jest.setTimeout.

    at src/payments/payments.service.spec.ts:87:18

  ● PaymentsService › processPayment › should retry on transient error

    expect(received).toHaveBeenCalledTimes(expected)

    Expected: 3
    Received: 1
    ...
[hundreds more lines]
Test Suites: 3 failed, 47 passed, 50 total
Tests:       12 failed, 438 passed, 450 total

Summarization prompt:

Summarize the following test output. Produce:
1. Overall: pass/fail counts, total duration
2. Failed test suites: file path + count of failures
3. For each unique failure pattern:
   - Test name
   - Failure type (assertion, timeout, exception, etc.)
   - One-line root cause if apparent from the error message
4. Any flaky test indicators (timeouts, intermittent patterns)
5. Coverage summary if present

Omit: passing test names, full stack traces (keep only the key failure line), 
repeated errors of the same type (group them with a count).

Summarized output (~80 tokens):

Test run: 438 passed / 12 failed / 50 suites (3 failed)

Failed: src/payments/payments.service.spec.ts (2 failures)
- "should handle Stripe timeout" — jest timeout (5000ms), likely missing mock for network call
- "should retry on transient error" — retry count assertion, expected 3 calls, got 1 (retry logic not triggering)

Failed: src/users/users.service.spec.ts (6 failures) — all assertion failures on email validation edge cases

Failed: src/auth/auth.guard.spec.ts (4 failures) — all timeout-related, probable async issue in test setup

Tip: For QA engineers running multi-cycle test agents, build a "test delta" summarization step: summarize not just the current test run but the difference between this run and the previous run. "3 new failures introduced, 5 previously failing tests now passing" is far more actionable than a full status report every cycle.

Token Budgeting for Code Context: A Practical Allocation Model

When building agent sessions that involve both code context and conversation, use a deliberate token allocation model:

Context Type	Recommended Budget	Compression Target
System prompt + instructions	2K-4K tokens	Fixed (do not compress)
Rolling session summary	400-800 tokens	Fixed max (enforce hard limit)
Code file skeletons	1K per file, max 10 files	10K-15K total
Full file content (when needed)	Max 3 files at once	Rotate in/out as needed
Diff summary / change set	300-600 tokens	Per feature branch
Test output summary	100-300 tokens	Per test run
Recent conversation turns	5K-10K	Prune after compression

Total working budget for a 200K-context model: ~30K-40K tokens, leaving a large safety buffer for model reasoning and output.

Tip: Build a context budget tracker into your agent orchestrator. Before each turn, compute the token estimate for each context component and log it. When any component exceeds its budget, trigger the appropriate compression function automatically. This transforms context management from an art into an engineered system.

Summary

Code and diff summarization produces compact representations that preserve behavioral contracts, interface definitions, and change semantics while eliminating syntactic noise. The code skeleton technique, semantic diff summarization, change set briefings, and test output compression each address a different category of token-heavy engineering artifact. Used together, they allow an agent to reason about large codebases and complex change sets at a fraction of the raw token cost — without losing the precision that software engineering requires. In the next topic, we extend these techniques into hierarchical context management for the most complex deep sessions.