Code context is the densest, most token-hungry category of input in engineering-focused agentic sessions. A single moderately complex file can consume 5K-15K tokens. A multi-file refactor can push 50K-100K tokens of diff context into the conversation before any discussion occurs. Unlike conversational text, code has a specific structure — functions, classes, modules, dependencies — that demands specialized summarization strategies distinct from what works for chat logs.
This topic covers how to produce compact, high-fidelity representations of code files, diffs, and change sets that give AI agents the structural understanding they need without paying the full token cost of raw source.
Why Code Summarization Requires a Different Approach
Conversational summarization can afford to lose some nuance because human language is redundant — meaning survives light compression. Code is not redundant. A single changed variable name, a missing await, or an altered type signature can represent a critical semantic change. At the same time, much of what makes code verbose — boilerplate, imports, comments, whitespace, documentation strings — carries little information for a reasoning agent that needs to understand what changed and why.
The goal of code summarization is not lossy compression of semantics. It is structured extraction of semantics from syntactic noise. The techniques below preserve behavior, interface, and decision information while stripping the tokens that an agent does not need.
Tip: Think of code summarization as producing an architectural briefing, not a compressed source file. The agent does not need to regenerate every line from the summary — it needs to understand structure, interfaces, and the rationale for changes well enough to make correct decisions about what to do next.
File-Level Summarization: The Code Skeleton
For large source files that need to be in context, the most effective technique is to replace the full file with a "code skeleton" — a compact representation that preserves interface signatures, class hierarchies, and key logic structure while removing implementation bodies.
Example: Full TypeScript service file (~180 lines, ~2,200 tokens)
// Full implementation — too many tokens for large codebases
import { Injectable } from '@nestjs/common';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { User } from './user.entity';
import { CreateUserDto } from './dto/create-user.dto';
import { UpdateUserDto } from './dto/update-user.dto';
import { HashService } from '../auth/hash.service';
import { EmailService } from '../email/email.service';
@Injectable()
export class UsersService {
constructor(
@InjectRepository(User)
private usersRepository: Repository<User>,
private hashService: HashService,
private emailService: EmailService,
) {}
async create(createUserDto: CreateUserDto): Promise<User> {
const existingUser = await this.usersRepository.findOne({
where: { email: createUserDto.email },
});
if (existingUser) {
throw new ConflictException('Email already registered');
}
const hashedPassword = await this.hashService.hash(createUserDto.password);
const user = this.usersRepository.create({
...createUserDto,
password: hashedPassword,
});
const saved = await this.usersRepository.save(user);
await this.emailService.sendWelcomeEmail(saved.email, saved.firstName);
return saved;
}
// ... 150 more lines
}
Code skeleton (~35 lines, ~380 tokens):
// src/users/users.service.ts — SKELETON SUMMARY
// Dependencies: User entity, CreateUserDto, UpdateUserDto, HashService, EmailService
@Injectable()
export class UsersService {
// Constructor: usersRepository (Repository<User>), hashService, emailService
async create(dto: CreateUserDto): Promise<User>
// Checks for duplicate email (throws ConflictException)
// Hashes password via hashService
// Saves user, sends welcome email via emailService
async findAll(): Promise<User[]>
// Returns all users, excludes password field
async findOne(id: string): Promise<User>
// Throws NotFoundException if not found
async update(id: string, dto: UpdateUserDto): Promise<User>
// Partial update, re-hashes password if changed
async remove(id: string): Promise<void>
// Soft delete (sets deletedAt timestamp)
private async validateUniqueEmail(email: string, excludeId?: string): Promise<void>
// Throws ConflictException if email taken by another user
}
The skeleton is 83% smaller and retains every interface, dependency, and behavioral contract the agent needs to make correct decisions about how this service interacts with the rest of the codebase.
Prompt to generate a skeleton:
Produce a code skeleton for the following file. Include:
- All class/function/method signatures with parameter types and return types
- One-line comments per method describing behavior (not implementation)
- Constructor dependencies
- Key thrown exceptions
- Any non-obvious side effects (external calls, state mutations)
Remove: all implementation bodies, import statements (summarize as a comment), docstrings, inline comments, blank lines.
Target: maximum 25% of original token count.
Tip: Generate code skeletons as a pre-processing step before injecting files into context, not during the session. Build a CLI tool or pre-commit hook that produces a .skeleton.ts alongside each source file. Then your agent orchestrator can inject the skeleton by default and only fetch the full source when a specific function needs to be read or modified.
Diff Summarization: From Raw Git Diff to Semantic Change Log
Raw git diffs are extraordinarily token-inefficient. A diff for a straightforward refactor might run to 800 lines and 10K tokens, while the semantic content — "extracted X logic from A into B, updated all callers" — can be expressed in 50 tokens. For agentic code review, debugging, and refactoring sessions, diff summarization is one of the highest-value compression techniques available.
Raw diff (excerpt, ~400 tokens):
diff --git a/src/orders/orders.service.ts b/src/orders/orders.service.ts
index 3a2f91c..8b4d2e1 100644
--- a/src/orders/orders.service.ts
+++ b/src/orders/orders.service.ts
@@ -45,28 +45,8 @@ export class OrdersService {
const items = await this.validateOrderItems(createOrderDto.items);
const subtotal = items.reduce((sum, item) => sum + item.price * item.quantity, 0);
- const tax = subtotal * 0.08;
- const shipping = createOrderDto.express ? 24.99 : 5.99;
- const discount = createOrderDto.couponCode
- ? await this.couponsService.getDiscount(createOrderDto.couponCode, subtotal)
- : 0;
- const total = subtotal + tax + shipping - discount;
+ const { tax, shipping, discount, total } = await this.pricingService.calculateTotal({
+ subtotal,
+ express: createOrderDto.express,
+ couponCode: createOrderDto.couponCode,
+ });
Diff summary (~60 tokens):
orders.service.ts: Extracted pricing calculation logic (tax, shipping, discount, total)
into new PricingService.calculateTotal() method. OrdersService now delegates to
PricingService instead of computing inline. Breaking change: PricingService is now
a required dependency.
Prompt to summarize a diff:
Summarize this git diff as a semantic change log. For each changed file:
1. One sentence: what changed (not how — what and why)
2. Any new dependencies introduced
3. Any interface/API changes (function signatures, exported types)
4. Any breaking changes (callers that need to be updated)
5. Any side effects on behavior (changed error conditions, altered defaults)
Ignore: whitespace changes, comment updates, reformatting that doesn't change behavior.
Format each file as a single bullet point with sub-bullets for items 2-5 if present.
Target: 1-3 sentences per file changed.
Tip: For multi-file PRs, ask the model to produce two diff summaries: a "per-file" summary (detailed, for reviewers) and an "executive summary" (2-3 sentences describing the overall change). The executive summary goes in the rolling session context; the per-file summary is stored as a reference artifact. This gives both levels of detail without paying the full token cost of the raw diff.
Change Set Context: Summarizing an Entire PR or Feature Branch
When an agent needs to reason about a full feature branch rather than a single commit, you need change set summarization — a higher-level representation of the cumulative changes across multiple files and commits.
Change set summary structure:
## Change Set Summary: [Feature Name / PR Title]
### Change Intent
[One sentence: what this change set accomplishes]
### Scope of Changes
- Files modified: [count] ([list of primary files affected])
- New files: [count] ([list])
- Deleted files: [count] ([list])
- Net lines: +[added] / -[removed]
### Architectural Changes
- [Any new modules, services, or abstractions introduced]
- [Any existing abstractions modified or removed]
- [New external dependencies]
### Interface Changes (Breaking / Non-Breaking)
- [List public API changes, clearly marked BREAKING or NON-BREAKING]
### Key Implementation Decisions
- [Decision 1 and brief rationale]
- [Decision 2 and brief rationale]
### Behavioral Changes
- [What the system does differently after this change]
- [Edge cases or error conditions that changed]
### Test Coverage
- New tests: [count, brief description]
- Modified tests: [count]
- Coverage delta: [+X% or noted as not measured]
This structure takes the agent from "here is 10,000 tokens of diff" to "here is a 300-token briefing that captures everything architecturally relevant."
Programmatic generation using git and Claude:
import subprocess
from anthropic import Anthropic
client = Anthropic()
def get_branch_diff(base_branch: str = "main") -> str:
"""Get the full diff for the current branch vs base."""
result = subprocess.run(
["git", "diff", f"{base_branch}...HEAD"],
capture_output=True, text=True
)
return result.stdout
def get_commit_log(base_branch: str = "main") -> str:
"""Get commit messages for the current branch."""
result = subprocess.run(
["git", "log", f"{base_branch}...HEAD", "--oneline"],
capture_output=True, text=True
)
return result.stdout
def summarize_change_set(base_branch: str = "main") -> str:
diff = get_branch_diff(base_branch)
commits = get_commit_log(base_branch)
# Truncate diff if very large — take first 50K chars
if len(diff) > 50000:
diff = diff[:50000] + "\n\n[DIFF TRUNCATED — showing first 50K characters]"
prompt = f"""Commit log:
{commits}
Git diff:
{diff}
Produce a structured change set summary using this format:
[paste your change set summary format here]
Be precise about interface changes — these are the most important items for code review and integration work."""
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1000,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Tip: Run summarize_change_set() as part of your CI pipeline and post the summary as a PR description. This gives both human reviewers and any AI agents doing automated review a high-quality context briefing before they look at a single line of diff. It doubles as documentation.
Test Output Summarization: From 10,000 Lines of CI Log to a Test Status Briefing
Test output is perhaps the most token-wasteful artifact in engineering workflows. A full Jest, pytest, or Go test run can produce thousands of lines for a handful of actual failures. For QA engineers and engineering teams running test agents, summarizing test output is as important as summarizing code diffs.
Raw test output (problematic):
FAIL src/payments/payments.service.spec.ts
● PaymentsService › processPayment › should handle Stripe timeout
Timeout - Async callback was not invoked within the 5000ms timeout specified by jest.setTimeout.
at src/payments/payments.service.spec.ts:87:18
● PaymentsService › processPayment › should retry on transient error
expect(received).toHaveBeenCalledTimes(expected)
Expected: 3
Received: 1
...
[hundreds more lines]
Test Suites: 3 failed, 47 passed, 50 total
Tests: 12 failed, 438 passed, 450 total
Summarization prompt:
Summarize the following test output. Produce:
1. Overall: pass/fail counts, total duration
2. Failed test suites: file path + count of failures
3. For each unique failure pattern:
- Test name
- Failure type (assertion, timeout, exception, etc.)
- One-line root cause if apparent from the error message
4. Any flaky test indicators (timeouts, intermittent patterns)
5. Coverage summary if present
Omit: passing test names, full stack traces (keep only the key failure line),
repeated errors of the same type (group them with a count).
Summarized output (~80 tokens):
Test run: 438 passed / 12 failed / 50 suites (3 failed)
Failed: src/payments/payments.service.spec.ts (2 failures)
- "should handle Stripe timeout" — jest timeout (5000ms), likely missing mock for network call
- "should retry on transient error" — retry count assertion, expected 3 calls, got 1 (retry logic not triggering)
Failed: src/users/users.service.spec.ts (6 failures) — all assertion failures on email validation edge cases
Failed: src/auth/auth.guard.spec.ts (4 failures) — all timeout-related, probable async issue in test setup
Tip: For QA engineers running multi-cycle test agents, build a "test delta" summarization step: summarize not just the current test run but the difference between this run and the previous run. "3 new failures introduced, 5 previously failing tests now passing" is far more actionable than a full status report every cycle.
Token Budgeting for Code Context: A Practical Allocation Model
When building agent sessions that involve both code context and conversation, use a deliberate token allocation model:
| Context Type | Recommended Budget | Compression Target |
|---|---|---|
| System prompt + instructions | 2K-4K tokens | Fixed (do not compress) |
| Rolling session summary | 400-800 tokens | Fixed max (enforce hard limit) |
| Code file skeletons | 1K per file, max 10 files | 10K-15K total |
| Full file content (when needed) | Max 3 files at once | Rotate in/out as needed |
| Diff summary / change set | 300-600 tokens | Per feature branch |
| Test output summary | 100-300 tokens | Per test run |
| Recent conversation turns | 5K-10K | Prune after compression |
Total working budget for a 200K-context model: ~30K-40K tokens, leaving a large safety buffer for model reasoning and output.
Tip: Build a context budget tracker into your agent orchestrator. Before each turn, compute the token estimate for each context component and log it. When any component exceeds its budget, trigger the appropriate compression function automatically. This transforms context management from an art into an engineered system.
Summary
Code and diff summarization produces compact representations that preserve behavioral contracts, interface definitions, and change semantics while eliminating syntactic noise. The code skeleton technique, semantic diff summarization, change set briefings, and test output compression each address a different category of token-heavy engineering artifact. Used together, they allow an agent to reason about large codebases and complex change sets at a fraction of the raw token cost — without losing the precision that software engineering requires. In the next topic, we extend these techniques into hierarchical context management for the most complex deep sessions.