Conversation Summarization | Token Optimization Masterclass

Conversation summarization is the craft of condensing a multi-turn agentic session into a compact, high-fidelity representation that preserves everything the agent needs to continue working effectively — and nothing it doesn't. Done well, a summary is not a shorter version of the chat log; it is a distilled operational briefing. Done poorly, it silently discards the very decisions that made the session valuable, turning the agent into an amnesiac that restarts from scratch.

This topic covers the specific techniques, prompt patterns, and structural approaches that produce reliable, high-quality conversation summaries across engineering, QA, and product management contexts.

Understanding What Belongs in a Summary

Before writing a single summarization prompt, you must have a clear mental model of what information carries load in a conversation and what is genuinely noise. These categories are not obvious — experienced practitioners are often surprised by what they can safely discard.

Must preserve:
- Decisions made and the rationale behind them (architecture choices, scope cuts, prioritization calls)
- Constraints established (performance targets, API contracts, security requirements, out-of-scope items)
- Current state of work in progress (what files have been changed, what tests pass, what is blocked)
- Open questions and their owners
- Errors encountered and how they were resolved (critical for avoiding re-doing failed approaches)
- Key facts discovered during the session (API rate limits found during testing, third-party library limitations, etc.)

Can discard:
- Pleasantries, acknowledgments, and conversational filler ("Got it," "Sounds good," "Sure, let me do that")
- Superseded plans and approaches (the second architecture proposal made after the first was rejected)
- Verbose tool outputs that were already summarized by the agent (raw JSON responses, full stack traces after the bug was fixed)
- Intermediate reasoning that led to a decision (keep the decision; usually the reasoning chain can go)
- Repeated context injections (if you injected a style guide 5 times, keep the reference to it, not the text 5 times)

Tip: Build a "decision register" habit. Whenever the agent or a human makes an explicit decision, mark it explicitly in your session notes with a [DECISION] tag. When it is time to summarize, these tagged items form the non-negotiable core of the summary. Everything else is a candidate for compression or removal.

The Structured Summary Format

An unstructured paragraph summary is almost always inferior to a structured summary for agentic sessions. Structure serves two purposes: it makes the summary easier for the model to parse when it is re-injected, and it makes it easier for humans to review.

A proven structure for engineering-focused sessions:

## Session Summary — [Date/Session ID]

### Objective
[One sentence: what this session set out to accomplish]

### Decisions Made
- [Decision 1]: [Brief rationale]
- [Decision 2]: [Brief rationale]

### Constraints & Requirements Established
- [Constraint 1]
- [Constraint 2]

### Work Completed
- [Milestone or artifact 1]
- [Milestone or artifact 2]

### Current State
[One paragraph: exactly where work stands right now, what is done, what is in progress]

### Open Questions
- [Question 1] — Owner: [Person/Agent]
- [Question 2] — Owner: [Person/Agent]

### Known Issues / Blockers
- [Issue 1]: [Status]

### Do Not Revisit (Closed Paths)
- [Approach X] was tried and rejected because [reason]. Do not suggest this again.

The "Do Not Revisit" section is particularly valuable and often overlooked. It explicitly tells the agent which approaches have already been explored and ruled out, preventing the frustrating pattern where an agent cycles back to a suggestion you already rejected three turns ago.

Tip: For QA engineers, adapt the "Work Completed" section to list test cases executed, bugs found, and coverage metrics. For product managers, replace it with "Decisions Made on User Stories" and "Scope Changes." The structure adapts across roles — the discipline of using a structure does not.

Summarization Prompt Patterns

The prompt you use to generate the summary is as important as the format you request. Here are tested prompt patterns for different session types:

Pattern 1: Engineering session summary

You are summarizing a technical coding session. Review the conversation above and produce a structured summary following this format:

[paste your structured format here]

Rules:
- Be precise about file names, function names, and technical decisions
- Keep the "Current State" section to 3-5 sentences maximum
- For each decision, capture the rationale in one sentence — do not expand it
- List rejected approaches in "Do Not Revisit" with a one-sentence reason
- Do not include any raw code in the summary — reference file/function names instead
- Total summary length: 300-500 words maximum

Pattern 2: QA session summary

Summarize this QA session into a compact test status report.

Include:
1. Test scope (what was being tested)
2. Test results by area (pass/fail counts, coverage %)
3. Bugs found (ID, severity, one-line description)
4. Environment and configuration details that are non-standard
5. Blocked test cases and the blocking reason
6. Next test cycle scope

Exclude all verbose test output, stack traces that are already logged, and exploratory paths that produced no findings.
Target length: 200-400 words.

Pattern 3: Product planning session summary

Summarize this product planning conversation into a decision log.

Capture:
1. Stories/features that were accepted into the sprint (with brief acceptance criteria)
2. Stories that were explicitly deferred and why
3. Open design/feasibility questions assigned to specific people
4. Constraints or dependencies identified
5. Any assumptions made that need validation

Do not include: discussion back-and-forth, alternatives that were rejected, or stakeholder pleasantries.
Target length: 250-450 words.

Tip: Always include an explicit word or token count target in your summarization prompt. Without a target, models tend to produce summaries that are either too dense (critical context lost) or too verbose (defeating the purpose). A good target is 15-25% of the original conversation length for a rich session.

Before/After: A Concrete Example

Before compression (raw conversation excerpt, ~800 tokens):

User: Let's start building the payment processing module. We need to integrate with Stripe.
Agent: Great! I'll start with the Stripe SDK setup. Do you want to use the latest v12 SDK?
User: Yes, v12. And we need to support both one-time payments and subscriptions.
Agent: Understood. I'll set up both. Should we use Stripe Checkout or the Payment Element?
User: Payment Element — we need more control over the UI.
Agent: Got it. I'll use Payment Element. What's the currency?
User: USD only for now. No multi-currency needed.
Agent: Understood, USD only. What about webhooks? Do you need to handle payment.succeeded and payment.failed?
User: Yes, both of those. Also subscription.updated and subscription.deleted.
Agent: Perfect. I'll set up webhook handlers for all four events. Should I use Stripe CLI for local testing?
User: Yes, use Stripe CLI. The webhook secret should come from environment variables.
Agent: Of course, I'll read STRIPE_WEBHOOK_SECRET from env. Let me start with the SDK initialization...
[agent produces 300 lines of code setup]
User: Looks good but we actually decided not to support subscriptions in v1. Scope is one-time payments only.
Agent: Understood, I'll remove the subscription code. [removes subscription code]
User: Also, do we need idempotency keys?
Agent: Yes, Stripe recommends idempotency keys for all charge operations to prevent duplicate charges.
User: Okay, add those.

After compression (~120 tokens):

### Payment Module — Session Summary

**Decisions Made:**
- Stripe SDK v12, Payment Element (not Checkout) for UI control
- USD only, no multi-currency in v1
- One-time payments only — subscriptions explicitly out of scope for v1

**Configuration:**
- Webhook handlers: payment.succeeded, payment.failed (subscription events removed)
- Stripe CLI for local webhook testing
- STRIPE_WEBHOOK_SECRET from environment variable
- Idempotency keys on all charge operations

**Current State:** SDK initialized, Payment Element component scaffolded, webhook handlers stubbed for two events.

**Do Not Revisit:** Subscription support was scoped out of v1. Do not add subscription-related code.

The compressed version retains every decision and constraint. The raw version is 6.7x longer and contains scope that was explicitly reversed mid-session.

Tip: Run a "compression audit" on your summaries by asking: if a new agent read only this summary, could it continue the session without making a decision that contradicts something already resolved? If yes, the summary is sufficient. If no, identify what is missing and add it. This audit takes 30 seconds and dramatically improves summary quality.

Handling Contradictions and Course Corrections

Real sessions contain course corrections — decisions that were made and then reversed. Naive summarization either drops both sides of the reversal (losing the rationale for the change) or includes both (confusing the agent). The correct approach is to capture only the final state plus a note that the earlier approach was tried:

Wrong (drops the reversal):

- Scope includes subscription payments

Wrong (includes both, creates ambiguity):

- Scope includes subscription payments
- Scope does not include subscription payments

Correct:

- Scope: one-time payments only. Subscriptions were initially included but explicitly cut from v1 scope during session.

The correct form captures the current state, signals that there was a deliberate decision (not an oversight), and prevents the agent from "helpfully" re-adding subscriptions because it noticed a gap.

Tip: When you have a session with multiple course corrections, add a "Version History" block at the top of your summary noting the major pivots chronologically. This is especially valuable for product sessions where the scope evolution itself is an artifact worth preserving.

Automating Summarization in Your Workflow

Manual summarization is a useful skill, but for production agentic systems you want it automated. Here is a pattern for wiring summarization into an agentic loop:

from anthropic import Anthropic

client = Anthropic()

SUMMARIZATION_SYSTEM_PROMPT = """You are a technical session summarizer.
When asked to summarize, produce a structured decision log following the provided format.
Be precise and concise. Target 300-500 words."""

def compress_session(conversation_history: list[dict], session_type: str) -> str:
    """Compress conversation history into a structured summary."""

    format_prompt = get_format_prompt(session_type)  # returns the structured format string

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        system=SUMMARIZATION_SYSTEM_PROMPT,
        messages=conversation_history + [
            {
                "role": "user",
                "content": f"Please summarize the conversation above using this format:\n\n{format_prompt}"
            }
        ]
    )

    return response.content[0].text

def should_compress(history: list[dict], token_count: int, model_limit: int) -> bool:
    return token_count > model_limit * 0.60 or len(history) > 40

def run_agent_loop(initial_task: str, session_type: str = "engineering"):
    history = []
    system_prompt = build_system_prompt(session_type)

    while True:
        user_input = get_next_user_input()
        if user_input == "exit":
            break

        history.append({"role": "user", "content": user_input})

        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=4096,
            system=system_prompt,
            messages=history
        )

        agent_reply = response.content[0].text
        history.append({"role": "assistant", "content": agent_reply})

        token_count = response.usage.input_tokens

        if should_compress(history, token_count, 200000):
            summary = compress_session(history, session_type)
            # Replace history with compressed version
            history = [{"role": "user", "content": f"[SESSION CONTEXT]\n{summary}\n\n[CONTINUING SESSION]"}]
            print("[Context compressed. Session summary preserved.]")

Tip: After each automated compression, log both the original token count and the compressed token count. Track your compression ratio over time. A well-tuned summarization system should achieve 5:1 to 10:1 compression ratios on typical engineering sessions without meaningful decision loss. If your ratio is below 3:1, your summarization prompt is too verbose. If it is above 15:1, you are likely losing critical context.

Summary

Conversation summarization is a precision skill: it requires knowing what information bears load in a given session type, constructing prompts that extract that information reliably, and validating that nothing critical was lost. The structured summary format — with explicit sections for decisions, constraints, current state, open questions, and closed paths — is more reliable than freeform prose summaries. Automate the trigger and the generation; review the output manually until you have calibrated confidence in your prompts. In the next topic, we scale this skill to continuous sessions through incremental summarization.