Context Layering Strategies | Token Optimization Masterclass

Context layering is the practice of assembling context from multiple sources using different timing, triggering, and injection strategies. Rather than treating context as a single monolithic block that is either present or absent, you design it as a layered system where each layer activates at the right moment, carries the right content, and is retired when no longer needed.

This topic teaches you the three core layering strategies — static, dynamic, and on-demand — and how to combine them into a coherent context assembly architecture that minimizes token cost while maximizing model effectiveness.

The Three Layering Strategies Defined

Static context is context that is assembled once and remains constant for the entire duration of a session, an agent, or a deployment. It does not change based on the task, the conversation state, or the user's role. System prompts are the canonical example of static context, but CLAUDE.md files, fixed tool definitions, and pre-loaded configuration blocks are also static.

Static context has predictable cost (you can calculate its token contribution precisely) and zero assembly overhead (no runtime computation needed). Its weakness is rigidity: it pays the same token cost whether or not the information is relevant to the current task.

Dynamic context is context that is selected and assembled at runtime based on observable signals: the task type, the current state of the conversation, the user's identity, the files being worked on, or the results of previous steps. Dynamic context changes from call to call, adapting to deliver relevant information when it is needed and withhold it when it is not.

Dynamic context requires an assembly layer — code or configuration that evaluates signals and constructs the context accordingly. This adds engineering complexity but unlocks substantial token savings by making context proportional to need.

On-demand context is context that is never pre-loaded into the conversation — it is fetched by the model itself, through tool calls, when the model determines it needs additional information. RAG (Retrieval-Augmented Generation) lookups, file read tools, database queries, and API calls are all forms of on-demand context. The model pays a small tool-call overhead (tool definition tokens + function call tokens) but avoids paying for context that may never be needed.

On-demand context has the best potential token efficiency because the model only fetches what it actually needs, when it actually needs it. Its weakness is latency (each tool call adds a round-trip) and reliability (the model must correctly judge when it needs more context).

Tip: A well-designed context architecture uses all three strategies in combination: static for universal constraints and identity, dynamic for task-relevant background, and on-demand for specific data fetched as the work progresses. Relying entirely on static context is the most common token efficiency mistake — it means paying for information that is irrelevant to most tasks.

Designing Static Context: What Truly Never Changes

Static context deserves rigorous curation because every token in it is paid on every call, forever. The bar for static inclusion should be: "Is this information required on every single interaction this agent has?"

Categories that legitimately belong in static context:

Agent identity and purpose: The role definition, the primary task the agent performs, and any behavioral constraints that define who the agent is. These apply to every interaction by definition.

Hard constraints and safety rules: Instructions that must be followed on every call regardless of task — security constraints, never-do rules, output format requirements that apply universally.

Environment invariants: Technology stack, language version, external service names — facts about the world that are true for every interaction in this deployment.

Examples of what does NOT belong in static context:
- Information about a specific feature being built (belongs in dynamic or on-demand)
- Specific file contents (belongs in dynamic or on-demand)
- User preferences gathered at session start (belongs in dynamic, injected at session initialization)
- Examples of past work (belongs in on-demand, fetched when relevant)

Static context audit test: For each item in your static context, ask "Did the model need this on the last 10 interactions?" If the answer is "no" for more than 2 of those interactions, the item is a candidate for moving to dynamic or on-demand.

Static context audit — before:
[650 tokens]
Role: Senior code reviewer
Stack: TypeScript, Node.js, PostgreSQL, Prisma
Style guide: Airbnb TypeScript
Security constraints: No secrets, no PII in logs, parameterized queries
Output format: Structured JSON review
Current feature: User authentication module  ← dynamic, not static
Files to review pattern: src/auth/** ← dynamic, not static
Recent changes: JWT added in PR #142 ← dynamic, not static

Static context — after audit:
[320 tokens]
Role: Senior code reviewer
Stack: TypeScript, Node.js, PostgreSQL, Prisma
Style guide: Airbnb TypeScript
Constraints: No secrets/PII in logs, parameterized queries only
Output format: {"verdict": string, "issues": array, "summary": string}

Tip: Treat your static context as a product that has a strict "features freeze" policy. Adding something to static context requires a deliberate decision and evidence that it is truly universally needed. Removing something requires a test that confirms behavior is unchanged without it. This discipline prevents the slow accumulation of static context that afflicts most long-running agents.

Dynamic Context: Signal-Driven Assembly

Dynamic context assembly requires identifying the signals that determine which context is relevant, then building an assembly layer that translates those signals into context blocks.

Common signals for dynamic context selection:

Signal	Context triggered
Task type classification	Task-specific instructions and examples
File extension of current task	Language-specific patterns and conventions
User role / persona	Role-appropriate output style and focus areas
Session turn number	Conversation state summary at high turn counts
Current sprint / feature flag	Feature-specific requirements and constraints
Error state detected	Debugging context and diagnostic instructions

Implementing a dynamic context assembly layer:

from dataclasses import dataclass
from typing import Optional
import anthropic

@dataclass
class TaskSignals:
    task_type: str          # "code_review" | "bug_fix" | "feature_generation" | "test_writing"
    language: Optional[str] # "typescript" | "python" | "java"
    user_role: str          # "engineer" | "qa" | "pm"
    turn_number: int
    has_error_context: bool

CONTEXT_BLOCKS = {
    "code_review": """
Review priorities:
1. Correctness and logic errors
2. Security: injection, auth, secrets
3. Performance: N+1 queries, blocking calls
4. Maintainability: naming, complexity, duplication
""",
    "bug_fix": """
Debugging approach:
1. Reproduce the error from the provided context
2. Identify root cause before proposing fix
3. Check for similar patterns elsewhere in the provided code
4. Propose fix with test to prevent regression
""",
    "test_writing": """
Test writing standards:
- Use Vitest describe/it structure matching existing tests
- Test happy path, error cases, and edge cases
- Mock external dependencies, not internal functions
- Assert on behavior, not implementation
""",
    "typescript": """
TypeScript conventions for this project:
- Strict mode enabled
- No any types without explicit comment justification
- Use Zod for runtime validation
- Prefer type over interface for unions
""",
    "pm_output_style": """
Output style for product context:
- Use plain English, avoid technical jargon
- Structure with: Impact, Approach, Risks, Questions
- Acceptance criteria in Given/When/Then format
"""
}

def assemble_dynamic_context(signals: TaskSignals) -> str:
    blocks = []

    # Task-type context
    if signals.task_type in CONTEXT_BLOCKS:
        blocks.append(CONTEXT_BLOCKS[signals.task_type])

    # Language context
    if signals.language and signals.language in CONTEXT_BLOCKS:
        blocks.append(CONTEXT_BLOCKS[signals.language])

    # Role-specific output style
    if signals.user_role == "pm":
        blocks.append(CONTEXT_BLOCKS["pm_output_style"])

    # Long session: remind to ask for more context if needed
    if signals.turn_number > 15:
        blocks.append("Note: This is a long session. If prior context is ambiguous, ask for clarification rather than assuming.")

    return '\n'.join(blocks).strip()

def call_agent(static_system_prompt: str, signals: TaskSignals, user_message: str, 
               conversation_history: list) -> str:
    client = anthropic.Anthropic()

    dynamic_context = assemble_dynamic_context(signals)

    # Combine static + dynamic as the full system prompt
    full_system = f"{static_system_prompt}\n\n{dynamic_context}".strip()

    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=2048,
        system=full_system,
        messages=conversation_history + [{"role": "user", "content": user_message}]
    )

    return response.content[0].text

This pattern saves tokens by paying for task-type-specific context only on calls where that task type is active. If 40% of calls are code reviews, 30% are bug fixes, and 30% are feature generation, each call pays for one task context block instead of all three — a roughly 66% reduction in task-context token cost.

Tip: Implement task type classification as a lightweight, fast step before your main LLM call. A simple keyword classifier, regex pattern matcher, or a very cheap small-model call can classify the task type in milliseconds with high accuracy. The token savings from dynamic context assembly quickly outpace the cost of the classification step.

On-Demand Context: Letting the Model Fetch What It Needs

On-demand context inverts the traditional approach. Instead of predicting what the model needs and injecting it ahead of time, you give the model tools to fetch information when it determines it needs them.

This pattern is powerful because it delegates context selection to the model itself — the entity with the best understanding of what is actually needed to complete the current step. The model only fetches context when it encounters genuine uncertainty or information gaps.

Designing effective on-demand context tools:

For on-demand context to work well, your tools must:
1. Have precise, specific names that tell the model exactly what they provide
2. Have descriptions that specify what the tool returns, not just what it accepts
3. Return only the relevant subset (tools should also do context selection internally)
4. Return a summary of what was not included (so the model knows what else is available)

tools = [
    {
        "name": "read_function_source",
        "description": "Read the source code of a specific function from the codebase. Returns only the function body, not the entire file. Use when you need to understand the implementation of a function before modifying it or writing tests for it.",
        "input_schema": {
            "type": "object",
            "properties": {
                "file_path": {"type": "string", "description": "Relative path to the file"},
                "function_name": {"type": "string", "description": "Exact name of the function to read"}
            },
            "required": ["file_path", "function_name"]
        }
    },
    {
        "name": "get_type_definition",
        "description": "Get the TypeScript type or interface definition for a named type. Returns the type definition and its direct dependencies. Use when you need to understand the shape of a data structure.",
        "input_schema": {
            "type": "object",
            "properties": {
                "type_name": {"type": "string", "description": "Name of the type or interface"}
            },
            "required": ["type_name"]
        }
    },
    {
        "name": "search_codebase_pattern",
        "description": "Search for usage patterns or examples in the codebase. Returns up to 5 matching code snippets with file locations. Use when you need to understand how a pattern is used elsewhere before applying it.",
        "input_schema": {
            "type": "object",
            "properties": {
                "pattern": {"type": "string", "description": "What to search for (function name, pattern, or concept)"},
                "context_lines": {"type": "integer", "description": "Lines of context around each match", "default": 5}
            },
            "required": ["pattern"]
        }
    }
]

Comparison: pre-loaded vs. on-demand context for a code generation task

Pre-loaded approach (3,200 tokens per call):
- System prompt: 300 tokens
- Full module directory (5 files): 2,400 tokens
- Current task description: 150 tokens
- Conversation history: 350 tokens

On-demand approach (900 tokens for most calls):
- System prompt: 300 tokens
- Current task description: 150 tokens
- Conversation history: 350 tokens
- Read the 2 relevant functions when needed: 400 tokens (only on calls where it reads them)

For tasks where the model does not need to read files (e.g., simple refactoring within the provided code), the on-demand approach costs 800 tokens vs. 3,200. For tasks where it reads both functions, it costs 1,200 vs. 3,200. The on-demand approach never costs more and typically costs 60–75% less.

Tip: Add explicit tool-call tracking to your observability layer. Log which on-demand context tools are called and how often. If a specific tool is called on >80% of interactions, consider promoting that tool's most common return value to dynamic context (it is clearly needed predictably). If a tool is called on <5% of interactions, it is working perfectly as on-demand context.

Combining All Three Layers: A Reference Architecture

A production context layering architecture uses all three strategies in complementary roles:

┌─────────────────────────────────────────────────────────┐
│  STATIC LAYER (system prompt, always present)           │
│  • Agent identity and role (40 tokens)                  │
│  • Hard constraints and safety rules (80 tokens)        │
│  • Output format specification (60 tokens)              │
│  Total: ~180 tokens, paid every call                    │
├─────────────────────────────────────────────────────────┤
│  DYNAMIC LAYER (assembled at request time)              │
│  • Task-type specific instructions (0–100 tokens)       │
│  • User role context (0–60 tokens)                      │
│  • Session state summary (0–200 tokens, turn > 10)      │
│  • Feature-specific context (0–150 tokens)              │
│  Total: 0–510 tokens, depends on signals                │
├─────────────────────────────────────────────────────────┤
│  ON-DEMAND LAYER (fetched by model via tools)           │
│  • File and function source code                        │
│  • Type definitions and schemas                         │
│  • Database query results                               │
│  • Documentation sections                               │
│  • API responses                                        │
│  Total: 0–2,000 tokens, depends on task needs          │
└─────────────────────────────────────────────────────────┘

Compare this to a typical naively-implemented agent:

┌────────────────────────────────────────────────────────┐
│  SINGLE STATIC LAYER (system prompt, always present)   │
│  • Everything above, plus:                             │
│  • Task-type instructions for ALL task types           │
│  • Pre-loaded file contents "just in case"             │
│  • Full conversation history (no summarization)        │
│  Total: 3,000–8,000 tokens per call                   │
└────────────────────────────────────────────────────────┘

The layered architecture consistently delivers 50–75% token reduction compared to the naive all-static approach.

Tip: Document your context layering architecture in your team's engineering wiki, with a diagram showing what goes in each layer and why. This shared understanding prevents individual engineers from adding content to the wrong layer — which is the most common way layering architectures degrade over time. Assign an owner to review the layering design on a quarterly cadence.

Managing Layer Interactions and Conflicts

When you combine multiple context layers, you must design explicitly for how they interact. Context from different layers can conflict (contradictory instructions), duplicate (same information repeated), or gap (important information assumed to exist in a layer that does not provide it in a given scenario).

Conflict prevention:
Establish a clear precedence hierarchy: static layer instructions override dynamic layer instructions, which override on-demand context. When the model might encounter seemingly contradictory instructions from different layers, include an explicit resolution rule in the static layer: "If task-specific instructions conflict with these general constraints, follow the general constraints."

Duplication prevention:
Build your dynamic context blocks with awareness of what the static layer already provides. If the static layer says "Language: TypeScript", the dynamic layer for TypeScript tasks should not re-state this — it should add the additional TypeScript-specific guidance that applies to the current task type.

Gap prevention:
Test each layer independently and in combination. For each task type your agent handles, verify that the combination of static + relevant dynamic context provides everything the model needs to make correct decisions without resorting to tool calls for basic information.

def detect_context_overlap(static_prompt: str, dynamic_blocks: dict[str, str]) -> list[str]:
    """Identify information repeated between static and dynamic layers."""
    overlaps = []
    static_keywords = extract_key_terms(static_prompt)

    for block_name, block_content in dynamic_blocks.items():
        block_keywords = extract_key_terms(block_content)
        overlap = static_keywords.intersection(block_keywords)
        if len(overlap) > 3:  # Threshold: >3 shared key terms suggests duplication
            overlaps.append(f"Potential overlap between static prompt and '{block_name}': {overlap}")

    return overlaps

Tip: Run context conflict detection in your CI pipeline as part of system prompt changes. When a developer modifies the static prompt or adds a new dynamic block, an automated check that flags potential duplication or conflict catches issues before they reach production — where they would silently inflate token usage on every call.