Scoped Agent Sessions | Token Optimization Masterclass

The Principle of Least Context

In security engineering, the principle of least privilege says: grant each process only the permissions it needs to do its job and no more. Scoped agent sessions apply the same philosophy to context: give each agent only the information it needs to complete its specific task.

This is not just a token optimization principle — it is an accuracy principle. Agents with bloated context suffer from the "lost in the middle" problem documented in multiple research papers: relevant information buried in a large context is attended to less reliably than information in a focused context. Scoped sessions are therefore both cheaper and more reliable.

A scoped agent session is an LLM conversation that is deliberately constructed with the minimum viable context for one specific subtask, run to completion, then terminated. The result is extracted as a structured artifact, and the session's raw context is discarded.

Architecture of a Scoped Session

A properly scoped session has five components, each intentionally minimized:

┌─────────────────────────────────────────────────────┐
│  SCOPED AGENT SESSION                               │
│                                                     │
│  1. Role System Prompt                              │
│     → Describes only this agent's specific role    │
│     → No generic "you can do anything" prompts     │
│     → Target: 200–500 tokens                       │
│                                                     │
│  2. Task Specification                              │
│     → Exactly what to do in this session           │
│     → Success criteria for early termination       │
│     → Target: 100–300 tokens                       │
│                                                     │
│  3. Scoped Context Payload                         │
│     → Only the data this agent needs               │
│     → Pre-filtered, pre-truncated, pre-summarized  │
│     → The most expensive component — optimize hard │
│                                                     │
│  4. Available Tools                                 │
│     → Only tools needed for this subtask           │
│     → Not the full tool registry                   │
│                                                     │
│  5. Output Schema                                   │
│     → Structured format for the result             │
│     → Forces compact, parseable output             │
└─────────────────────────────────────────────────────┘

The contrast with an unscoped session:

UNSCOPED SESSION (common anti-pattern):
  - Generic "helpful assistant" system prompt: 2,000 tokens
  - Full conversation history from parent agent: 15,000 tokens
  - All tools available to parent: 3,000 tokens
  - Actual task spec: 200 tokens
  - Actual needed context: 1,000 tokens
  Total: ~21,200 tokens per call

SCOPED SESSION (optimized):
  - Role-specific system prompt: 350 tokens
  - Task spec: 200 tokens
  - Filtered context payload: 1,000 tokens
  - Subtask tools only: 600 tokens
  Total: ~2,150 tokens per call

Reduction: ~90%

Tip: When designing scoped sessions, write the output schema first. Starting from "what structured data do I need this agent to produce?" forces you to think backward to "therefore, what is the minimum context it needs to produce that output?" This reverse-design process almost always results in a tighter context scope than starting from "what context does this agent need?"

Context Filtering: Delivering Only What the Sub-Agent Needs

The scoped context payload is the hardest component to get right. The orchestrator must filter the available information down to only what the sub-agent needs. Here are the filtering techniques:

Technique 1: Semantic Relevance Filtering

Use embedding similarity to select only the most relevant chunks of a larger knowledge base for each sub-agent invocation.

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer('all-MiniLM-L6-v2')

def filter_context_for_subtask(
    available_context: list[str],
    subtask_description: str,
    top_k: int = 5
) -> list[str]:
    """Return only the top_k most relevant context chunks for this subtask."""
    task_embedding = model.encode(subtask_description)
    context_embeddings = model.encode(available_context)

    similarities = np.dot(context_embeddings, task_embedding) / (
        np.linalg.norm(context_embeddings, axis=1) * np.linalg.norm(task_embedding)
    )

    top_indices = np.argsort(similarities)[-top_k:][::-1]
    return [available_context[i] for i in top_indices]

subtask = "Fix N+1 query in UserListView that fetches user posts"
relevant_context = filter_context_for_subtask(
    available_context=all_code_chunks,
    subtask_description=subtask,
    top_k=4  # Only 4 code chunks, not all 50
)

Technique 2: Schema-Based Projection

When the available context is structured data (database records, API responses, configuration objects), project only the fields needed by the sub-agent.

def project_context(full_record: dict, needed_fields: list[str]) -> dict:
    """Return only the fields the sub-agent needs."""
    return {k: v for k, v in full_record.items() if k in needed_fields}

bug_fix_context = project_context(
    full_record=incident_data,
    needed_fields=["file_path", "line_number", "error_message", "stack_trace"]
)

Technique 3: Summarize-Before-Inject

When prior work (from parent agent or other sub-agents) must be communicated to a sub-agent, summarize it first rather than injecting the raw history.

def summarize_for_handoff(raw_agent_output: str, target_role: str) -> str:
    """Compress prior agent output into a handoff summary for the next agent."""
    summary_prompt = f"""
    The following is output from a prior agent step. 
    Summarize it into structured bullet points that a {target_role} needs.
    Include only: key decisions made, artifacts produced (with locations), 
    blockers identified, and open questions.
    Keep under 300 words.

    Prior output:
    {raw_agent_output}
    """
    return llm.invoke([HumanMessage(content=summary_prompt)]).content

Tip: For engineering teams: implement context filtering as a separate "context preparation" step in your orchestration pipeline, not inside the sub-agent itself. The sub-agent should receive already-filtered context. If the sub-agent has to figure out what's relevant in a large context dump, you have just shifted the cognitive (and token) burden to the most expensive step.

Scoped Sessions in Practice: Framework-Specific Approaches

LangGraph: Using Subgraphs with Isolated State

LangGraph supports subgraphs — nested state machines that run with their own state object. This is the natural mechanism for scoped sessions.

from langgraph.graph import StateGraph, END
from typing import TypedDict

class SecurityScanState(TypedDict):
    file_content: str        # The specific file to scan
    scan_criteria: list[str] # What to look for
    findings: list[dict]     # Output

def build_security_scan_subgraph():
    builder = StateGraph(SecurityScanState)

    builder.add_node("analyze", analyze_file_node)
    builder.add_node("classify", classify_findings_node)

    builder.set_entry_point("analyze")
    builder.add_edge("analyze", "classify")
    builder.add_edge("classify", END)

    return builder.compile()

def invoke_security_scan(file_path: str, file_content: str) -> list[dict]:
    subgraph = build_security_scan_subgraph()
    result = subgraph.invoke({
        "file_content": file_content,    # Scoped: just this file
        "scan_criteria": SECURITY_CRITERIA,
        "findings": []
    })
    return result["findings"]
    # The subgraph's full conversation history is NOT accessible to the parent

CrewAI: Agent-Level Context Scoping

In CrewAI, you control what context each agent receives through the context parameter of tasks.

from crewai import Agent, Task, Crew

file_analyzer = Agent(
    role="File Security Analyzer",
    goal="Analyze a single file for security vulnerabilities",
    backstory="Expert in code security patterns. Analyzes one file at a time.",
    # Note: no memory=True — we don't want cross-task accumulation here
    verbose=False,
    max_iter=5  # Hard cap on iterations
)

def create_file_scan_task(file_content: str, filename: str) -> Task:
    return Task(
        description=f"""
        Analyze this file for security vulnerabilities.
        File: {filename}

        Content:
        {file_content}

        Return a JSON array of findings with: severity, type, line_number, description, fix.
        Return [] if no issues found.
        """,
        agent=file_analyzer,
        expected_output="JSON array of security findings"
        # No context= parameter — this task is self-contained
    )

AutoGen: Using Conversation Termination and Fresh Instances

In AutoGen, resist the temptation to maintain long-running GroupChat sessions for multi-step tasks. Instead, use targeted two-agent conversations with termination conditions.

import autogen

def run_scoped_analysis(file_content: str, task_description: str) -> str:
    """Run a scoped two-agent conversation for a single focused task."""

    # Fresh assistant for this task — no memory of prior tasks
    assistant = autogen.AssistantAgent(
        name="analyzer",
        llm_config={
            "config_list": config_list,
            "system_message": FOCUSED_ANALYST_PROMPT  # Narrow role
        }
    )

    user_proxy = autogen.UserProxyAgent(
        name="user_proxy",
        human_input_mode="NEVER",
        max_consecutive_auto_reply=5,  # Hard iteration cap
        is_termination_msg=lambda msg: "ANALYSIS_COMPLETE" in msg.get("content", "")
    )

    # Only provide what's needed for this specific analysis
    user_proxy.initiate_chat(
        assistant,
        message=f"{task_description}\n\nFile content:\n{file_content}"
    )

    return extract_structured_result(user_proxy.chat_messages[assistant])

Tip: In AutoGen GroupChat scenarios, consider "conversation checkpointing": every 5 rounds, have a dedicated summarizer agent compress the conversation so far into a brief status update, then reset the conversation with only that summary as history. This prevents the quadratic accumulation problem while preserving important context.

The Context Scope Registry Pattern

As your agentic system grows, managing what context each sub-agent type needs becomes a maintenance challenge. A Context Scope Registry makes this explicit and auditable:

from dataclasses import dataclass
from typing import Callable

@dataclass
class ContextScope:
    agent_role: str
    system_prompt: str
    required_context_fields: list[str]
    available_tools: list[str]
    max_iterations: int
    output_schema: dict

CONTEXT_REGISTRY = {
    "security_scanner": ContextScope(
        agent_role="Security Scanner",
        system_prompt=SECURITY_SCANNER_PROMPT,       # 350 tokens
        required_context_fields=["file_content", "filename", "language"],
        available_tools=["read_file_section", "search_patterns"],
        max_iterations=8,
        output_schema=SECURITY_FINDINGS_SCHEMA
    ),
    "test_writer": ContextScope(
        agent_role="Test Writer",
        system_prompt=TEST_WRITER_PROMPT,            # 400 tokens
        required_context_fields=["function_source", "existing_tests", "test_conventions"],
        available_tools=["read_file", "write_file", "run_tests"],
        max_iterations=6,
        output_schema=TEST_RESULTS_SCHEMA
    ),
    "doc_updater": ContextScope(
        agent_role="Documentation Updater",
        system_prompt=DOC_UPDATER_PROMPT,            # 300 tokens
        required_context_fields=["change_summary", "existing_doc_section"],
        available_tools=["read_file", "write_file"],
        max_iterations=4,
        output_schema=DOC_UPDATE_SCHEMA
    )
}

def invoke_scoped_agent(role: str, context_data: dict) -> dict:
    scope = CONTEXT_REGISTRY[role]

    # Validate that only required fields are provided
    scoped_context = {k: context_data[k] for k in scope.required_context_fields}

    # Build the session with the minimum viable context
    return run_session(
        system_prompt=scope.system_prompt,
        context=scoped_context,
        tools=get_tools(scope.available_tools),
        max_iter=scope.max_iterations,
        output_schema=scope.output_schema
    )

The registry serves as living documentation of what each agent type needs, makes context auditing straightforward, and prevents scope creep where "just one more field" gets added to agent context over time.

Tip: For product managers and QA engineers: the Context Scope Registry is a powerful communication artifact. During sprint planning or architecture reviews, present it as a table showing each agent type, its input fields, its tools, and its output schema. This makes the token cost implications of feature additions visible to the whole team — if someone asks to add a new capability to an existing agent, the registry makes it clear whether that requires expanding the context scope (and by how much).

Measuring Session Scope Quality

A well-scoped session should pass these three checks:

1. The Sufficiency Check: Can the agent complete its task using only the provided context, with no need to fetch additional information? If the agent frequently calls tools to retrieve more context during a session, the context payload was under-specified for the task.

2. The Minimality Check: Remove each field from the context payload one at a time. If removing a field does not cause the agent to fail or produce meaningfully different output, that field should not be in the scope.

3. The Independence Check: If you gave the same context to a different, equally capable model, would it produce an equivalent output? If the answer requires "well, the agent also implicitly knows from prior conversation that...", you have context leakage — information the agent relies on that is not in the defined scope.

def audit_session_scope(scope: ContextScope, test_cases: list[dict]) -> dict:
    """Run sufficiency, minimality, and independence checks."""
    results = {"sufficiency": [], "minimality": {}, "independence": []}

    for test_case in test_cases:
        context = {k: test_case[k] for k in scope.required_context_fields}

        # Sufficiency: count additional fetch tool calls
        result, tool_calls = run_session_with_tool_logging(scope, context)
        fetch_calls = [c for c in tool_calls if c["type"] == "fetch_additional_context"]
        results["sufficiency"].append(len(fetch_calls) == 0)

    # Minimality: ablation test each field
    for field in scope.required_context_fields:
        reduced_fields = [f for f in scope.required_context_fields if f != field]
        reduced_scope = replace(scope, required_context_fields=reduced_fields)
        # Run test suite with reduced scope — does success rate drop?
        results["minimality"][field] = measure_success_rate(reduced_scope, test_cases)

    return results

Tip: Run scope audits whenever you add new capabilities or new task types to your agentic system. Context scope tends to expand over time as edge cases are handled by "just adding more context" rather than fixing the underlying design. A monthly scope audit — running the minimality check across your agent registry — keeps this under control.

Summary

Scoped agent sessions are the structural implementation of the principle of least context. The techniques — semantic filtering, schema projection, summarize-before-inject — are the mechanisms that make scoping practical. Framework-specific implementations in LangGraph (subgraphs), CrewAI (task context parameters), and AutoGen (fresh instances with termination conditions) give you concrete patterns to apply immediately.

The Context Scope Registry pattern makes scope management explicit, auditable, and maintainable as systems grow. The three-check audit (sufficiency, minimality, independence) provides a systematic way to validate and maintain scope quality.

Together, context scoping and task decomposition (from the previous topic) provide the foundational architecture for token-efficient agentic systems. The remaining topics address additional optimization layers: early termination, caching, and orchestration strategy.