Hierarchical Context | Token Optimization Masterclass

When an agentic session extends across multiple days, produces dozens of milestones, involves multiple agents handing off to each other, or spans an entire project lifecycle, a single rolling summary is no longer sufficient. The rolling summary itself grows large. You need a hierarchy: summaries that summarize earlier summaries, organized into tiers of increasing abstraction and decreasing recency. This is hierarchical context management.

Hierarchical context management is the architecture used in the most sophisticated long-running agentic systems — automated coding assistants that run for days, project management agents that span sprints, and QA orchestrators that persist across release cycles. It is the final tier of the compression stack, and understanding it unlocks the ability to run agentic sessions of unbounded duration with bounded context costs.

The Three-Tier Context Architecture

The hierarchy has three tiers, each with a distinct purpose, token budget, and update frequency:

Tier 1 — Hot Context (Recent Turns): The live conversation buffer. This is the raw, uncompressed context of the current working session: typically the last 15-25 turns. It is read-only for the model in the sense that it is injected fresh each turn. It grows until it hits the chunk window threshold, then it is compressed into Tier 2.

Tier 2 — Warm Context (Rolling Session Summary): The running summary of the current session, updated incrementally as Tier 1 chunks are compressed into it. This captures the current session's decisions, progress, and state in full fidelity. Budget: 600-1,200 tokens. Updated every 15-25 turns.

Tier 3 — Cold Context (Project Archive): The long-term memory of the project. This is a compressed summary of all previous sessions and completed milestones. It captures the project's history, major decisions, architectural evolution, and current stable state. Budget: 1,000-3,000 tokens. Updated at session boundaries (end of each work session or sprint).

Each tier is injected differently into the agent's context:
- Tier 3 is injected in the system prompt or as a persistent prefix (it is static within a session)
- Tier 2 is injected at the start of each turn, updated after compression cycles
- Tier 1 is the live conversation messages array

Tip: Think of the three tiers as the difference between a project wiki, a meeting notes document, and a live chat window. The wiki captures permanent knowledge, the meeting notes capture recent decisions, and the chat window captures the current conversation. Each has a different update cadence and a different level of abstraction. Your context management system should mirror this.

Designing the Project Archive (Tier 3)

The Tier 3 archive is the highest-stakes document in the hierarchy because it persists across sessions and is never cleared. Every piece of information in it was compressed from lower tiers through multiple cycles. Getting its structure right is critical.

A robust Tier 3 archive structure:

## Project Archive: [Project Name]
*Last updated: [date/session ID]*

### Project Identity
- **Goal:** [One sentence describing the overall project objective]
- **Stack:** [Primary technologies, frameworks, versions]
- **Repository:** [Repo URL / path]
- **Key team members & roles:** [If relevant to agent behavior]

### Architectural Decisions (Stable)
These decisions are final and should not be revisited without explicit instruction:
- [ADR-001]: [Decision and brief rationale]
- [ADR-002]: [Decision and brief rationale]
- [ADR-003]: [Decision and brief rationale]

### Completed Milestones
- [Milestone 1]: Completed [date]. [One-sentence summary of what was built/resolved]
- [Milestone 2]: Completed [date]. [One-sentence summary]

### Current Project State
[2-3 sentences: where the project stands as of the most recent session. What is working, what is in progress, what is next.]

### Known Issues / Technical Debt
- [Issue 1]: [Status, severity, brief description]
- [Issue 2]: [Status]

### Constraints & Non-Negotiables
- [Constraint 1]: [Why it exists]
- [Constraint 2]: [Why it exists]

### Patterns & Standards Established
- [Pattern 1]: [Brief description, where it is used]
- [Pattern 2]: [Brief description]

### Do Not Revisit (Closed Decisions)
These were explored and explicitly rejected. Do not suggest them:
- [Rejected approach 1]: [One-sentence reason]
- [Rejected approach 2]: [One-sentence reason]

The critical discipline: the Tier 3 archive should never grow without bound. Apply a hard token budget (1,500-2,500 tokens) and enforce it. When it approaches the limit, compress it further: merge completed milestones into a single summary line, consolidate related constraints, prune obsolete "do not revisit" items.

Tip: Version-control your Tier 3 archive as a file in the project repository (e.g., .agent/project-context.md). This makes it visible to all team members, auditable via git history, and automatically backed up. An agent that reads and writes its own Tier 3 archive from the repo has a persistent, team-shared long-term memory.

The Compression Cascade: How Information Flows Up the Hierarchy

The compression cascade is the process by which raw conversational content flows upward through the tiers:

[Raw turns] → compress → [Tier 2: Rolling Summary]
[Rolling Summary] → distill → [Tier 3: Project Archive]

Each transition has different semantics:

Tier 1 → Tier 2 (compress): Detailed, faithful, preserving all decisions and context from the recent chunk. Runs every 15-25 turns. This is the incremental summarization covered in the previous topic.

Tier 2 → Tier 3 (distill): Higher-level abstraction. The session's decisions are reviewed and sorted:
- Decisions that represent permanent project commitments → moved to Tier 3 "Architectural Decisions"
- Completed work → moved to Tier 3 "Completed Milestones"
- Known issues → merged into Tier 3 "Known Issues"
- Session-specific tactical detail → discarded (does not survive the Tier 2 → Tier 3 transition)
- Open questions → carried into the next session's Tier 2

The Tier 2 → Tier 3 distillation runs at session boundaries — when a work session ends, not continuously.

Distillation prompt:

You are updating a project's long-term memory archive.

CURRENT PROJECT ARCHIVE (Tier 3):
{tier_3_content}

END-OF-SESSION SUMMARY (Tier 2):
{tier_2_content}

Update the project archive by:
1. Moving any new permanent architectural decisions to the "Architectural Decisions" section
2. Adding newly completed milestones to "Completed Milestones" (one-line each)
3. Updating "Current Project State" to reflect end-of-session status
4. Adding new known issues or updating status of existing ones
5. Adding new established patterns/standards
6. Adding explicitly rejected approaches to "Do Not Revisit"
7. Updating "Constraints" if any new non-negotiables were established

Do NOT include in the archive:
- Tactical session details (specific prompts used, debugging paths, intermediate results)
- Open questions that are still being explored
- Decisions that may be revisited
- Any content that would not remain valid across future sessions

HARD BUDGET: Keep the archive under {budget} tokens. Compress existing items if needed to stay within budget.
Output only the updated archive.

Tip: Run the distillation step as a CLI command you invoke at the end of each work session — never automatically within a session. Manual invocation gives you the opportunity to review the Tier 2 summary before it is distilled and catch any misclassifications. After a few weeks, you will develop a pattern for which items belong in the archive and which do not.

Multi-Agent Hierarchical Context: Orchestrator and Subagent Patterns

In multi-agent architectures — where an orchestrator delegates tasks to specialist subagents — the hierarchy maps naturally to agent roles:

Orchestrator carries Tier 3 + Tier 2. The orchestrator maintains the full project archive and the current session summary. It has the highest-level view of the project.

Subagents receive task-specific briefings derived from Tier 2 + Tier 3. When delegating a task, the orchestrator constructs a targeted context briefing for the subagent: project-level constraints from Tier 3, current session state from Tier 2, plus the specific task description. The subagent does not get the full archive — only what it needs to do its job.

Subagent results are merged back into Tier 2. When the subagent completes its task, its results (a structured report, diff summary, test results) are merged into the orchestrator's Tier 2 rolling summary.

def create_subagent_briefing(
    tier3: str,
    tier2: str, 
    task_description: str,
    relevant_files: list[str]
) -> str:
    """Construct a focused context briefing for a subagent."""

    return f"""# Subagent Task Briefing

## Project Context (Key Facts)
{extract_relevant_tier3(tier3, task_description)}

## Current Session State
{tier2}

## Your Task
{task_description}

## Relevant Files
{format_file_list(relevant_files)}

## Constraints
- Follow all architectural decisions listed above
- Do not implement features outside the task scope
- Report any blockers or conflicts with existing decisions before proceeding

When complete, report your results using the structured result format."""

def extract_relevant_tier3(tier3: str, task: str) -> str:
    """Extract only the Tier 3 sections relevant to a specific task."""
    # Use the model to extract relevant sections
    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=600,
        messages=[{
            "role": "user",
            "content": f"From this project archive, extract only the sections relevant to this task: '{task}'\n\nArchive:\n{tier3}\n\nReturn only the relevant sections, no commentary."
        }]
    )
    return response.content[0].text

Tip: When a subagent's work produces a decision that should be elevated to the project archive, have the subagent explicitly flag it in its result report: [ARCHIVE-WORTHY]: [Decision/finding description]. The orchestrator can then process these flags during the next Tier 2 → Tier 3 distillation. This creates a reliable escalation path for decisions that start at the subagent level and need to persist at the project level.

Context Inheritance: What Each Tier Passes Down

A common mistake is to think of the hierarchy as three separate systems. In practice, each tier's content flows downward at injection time. Here is what each agent turn actually receives:

System Prompt:
  - Base agent instructions
  - [TIER 3] Project archive (static for the session)

Messages array:
  - User: "[TIER 2] Rolling session summary as of turn N"
  - Assistant: "Understood, here is the current state: [brief acknowledgment]"
  - [TIER 1] Recent turns (last 15-20)
  - User: [current turn]

This structure ensures the agent always has the full context hierarchy without any tier overwhelming the others. Tier 3 is in the system prompt because it is persistent background — it should not compete with the live conversation for recency bias. Tier 2 is injected as the first user message because it is the current frame of reference. Tier 1 is the live conversation.

The injection order matters because transformer models give more attention weight to recent tokens. You want the current session summary (Tier 2) and recent turns (Tier 1) to be "recent" in the context, while the project archive (Tier 3) provides stable background without dominating recency.

Tip: Experiment with injecting the Tier 2 summary as a user message vs. a system prompt section. For most models, placing Tier 2 in the messages array (as the first user message with an acknowledgment response) produces better recall than placing it in the system prompt. The conversational framing helps the model parse it as active working context rather than background instructions.

Implementing a Session Boundary Protocol

The session boundary — the end of one work session and the start of the next — is the most critical moment in hierarchical context management. A clear protocol makes resumption reliable:

End-of-session protocol:
1. Trigger a final Tier 1 → Tier 2 compression (regardless of chunk window position)
2. Review Tier 2 manually for accuracy
3. Run Tier 2 → Tier 3 distillation
4. Review Tier 3 update manually
5. Commit updated Tier 3 archive to version control
6. Log session metrics (token usage, turn count, cost, duration)
7. Close the session

Start-of-session protocol:
1. Load Tier 3 archive from version control
2. Create a new, empty Tier 2 rolling summary
3. Inject a "session start" prompt:

[NEW SESSION — {date}]
Project archive loaded. Rolling summary initialized.

Any updates since last session:
- [External change 1, if any]
- [External change 2, if any]

Today's session objective: {today's goal}

Begin Tier 1 (live conversation)

Tip: Record the "session objective" in a structured task list (a simple text file or Jira ticket) before starting the session. Opening the session with a clear, one-sentence objective dramatically improves focus and makes the end-of-session distillation easier — the session summary is naturally organized around whether and how the objective was achieved.

Scaling the Hierarchy: When to Add a Fourth Tier

For very long-running projects (multiple months, dozens of sessions), even the Tier 3 archive can grow stale or bloated with historical context that no longer actively affects current work. At this scale, consider a fourth tier:

Tier 4 — Project History (Deep Archive): A long-form, prose-style history of the project. This is not injected into live agent sessions by default. It is a reference document for humans and for occasional deep-research agent tasks ("why did we make this architectural decision in Q1?"). No token budget limits — it is a full audit trail.

Tier 3 periodically distills into Tier 4 by moving completed milestones and resolved issues from the archive into the history document. Tier 3 stays current; Tier 4 accumulates everything.

The practical trigger for introducing Tier 4: when your Tier 3 archive, despite aggressive compression, is consistently above 2,500 tokens and you notice that more than half of it describes work completed more than 4 weeks ago.

Tip: Tier 4 is primarily for humans, not agents. Write it in clear, narrative prose rather than structured bullet points. A QA engineer joining the project 3 months in, or a product manager reviewing technical decisions made last quarter, will find it far more accessible than compressed structured data. Let the lower tiers handle agent efficiency; let Tier 4 handle human knowledge transfer.

Summary

Hierarchical context management is the capstone of the compression stack: three (or four) tiers of progressively abstracted context, each with a defined budget, update cadence, and injection pattern. The architecture scales agentic sessions from hours to days to months while maintaining bounded context costs and reliable agent performance. The cascade from raw turns to rolling summary to project archive ensures that nothing critical is lost and nothing stale accumulates. With this architecture in place, long-running agentic workflows become reliable, auditable, and team-scalable. The final topic puts everything together in a concrete, hands-on implementation for a multi-hour coding session.