Context selection is the discipline of choosing what to include in the model's context window at any given moment. It is arguably the highest-leverage skill in token optimization because no amount of format compression can compensate for sending the wrong context in the first place. Sending an entire codebase when the model needs two functions, or an entire user manual when the model needs one section, is a token efficiency failure at the source.
This topic gives you a systematic framework for context selection — the mental models, technical patterns, and practical decision-making processes that ensure every token you send is earning its place.
The Context Selection Mental Model
The fundamental question of context selection is: "What does the model need to know right now to complete this task correctly?"
Not: "What might be useful." Not: "What's related." Not: "What I have available." Exactly what it needs, right now, for this specific task.
This requires you to think from the model's perspective. Given the task, what information would be missing if you started from a blank slate? That missing information is the context to include. Everything already implicit in the task description, derivable from common knowledge, or irrelevant to the specific output requested is context you should leave out.
Consider a concrete example. A QA engineer asks the agent: "Write a test for the validateEmail function in src/validators/email.ts."
Context that is needed:
- The source code of validateEmail in email.ts (the function being tested)
- The test file structure/pattern from an adjacent test file (how tests are organized in this project)
- The test framework configuration (what framework, what assertion style)
Context that is not needed:
- The entire src/validators/ directory
- The full email.ts file if it contains 15 other functions besides validateEmail
- The CI pipeline configuration
- The database schema
- The README
Many developers send the entire src/validators/ directory "just to be safe." This provides the model with potentially 10x more tokens than necessary and dilutes the signal of the specific function being tested.
Tip: Before assembling context for any agent call, write down in one sentence what the model needs to do. Then list the minimum pieces of information it needs to do that task. Every context item that does not directly support one of those information needs is a candidate for exclusion. This deliberate step takes 30 seconds and consistently produces more focused, less bloated context.
Context Selection by Task Type
Different task types have different context requirements. Building a mental map of these requirements for your most common tasks lets you create context selection rules that can be automated.
Code generation tasks:
- Required: Relevant function signatures / interfaces the new code must interact with
- Required: 1–2 examples of adjacent code following the same pattern
- Required: Any type definitions or schemas the new code must conform to
- Often unnecessary: Full file contents of files that only contain the interfaces (just the types/signatures)
- Often unnecessary: Implementation details of functions being called (the interface is enough)
Code review tasks:
- Required: The code being reviewed (full diff or relevant changed functions)
- Required: The requirements or ticket description the code is implementing
- Often valuable: Tests that exercise the changed code
- Often unnecessary: Unchanged portions of the file that are not affected by the change
- Often unnecessary: Other files in the module that the change does not interact with
Bug investigation tasks:
- Required: The error message and stack trace (trimmed to first 20–30 lines)
- Required: The specific function/code path where the error occurs
- Required: The input that triggered the error
- Often valuable: The last 1–2 relevant changes to the affected code (git diff)
- Often unnecessary: The full application logs (dozens of irrelevant requests)
- Often unnecessary: Unrelated modules even if they are in the same service
Product requirement analysis tasks (for PMs):
- Required: The specific user story or requirement being analyzed
- Required: The acceptance criteria
- Often valuable: 1–2 related user stories that share data or workflow
- Often unnecessary: The full product backlog
- Often unnecessary: Historical sprint data not related to the current story
Test planning tasks (for QA):
- Required: The feature specification or user story
- Required: The existing test coverage summary for adjacent features (not full test files)
- Often valuable: The risk classification for the feature (if available)
- Often unnecessary: Full test suite source code
Tip: Create a context selection checklist for each major task type your team performs. Document it in your team's engineering wiki or AI tooling runbook. When a new engineer starts, this checklist prevents the "include everything" default behavior that is the most common source of context bloat.
Precision Context Extraction Techniques
Even when you know what to include, extracting exactly the right subset of a file or document requires deliberate technique. Here are the key methods:
Function-level extraction for code:
Instead of sending an entire file, extract only the relevant function(s). For Python, use the ast module. For TypeScript/JavaScript, use a simple regex or a parser like @typescript-eslint/typescript-estree. For most languages, a well-tuned regex that matches the function signature through its closing brace is sufficient.
import ast
import inspect
def extract_function(source_code: str, function_name: str) -> str:
"""Extract a single function's source code from a Python file."""
tree = ast.parse(source_code)
for node in ast.walk(tree):
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
if node.name == function_name:
return ast.get_source_segment(source_code, node)
return None
with open("src/validators/email.py") as f:
source = f.read()
validate_email_code = extract_function(source, "validate_email")
Interface-only extraction:
When you need to tell the model about a function it will call but it does not need to understand the implementation, extract only the signature and docstring:
// Full function (80 tokens):
async function hashPassword(password: string): Promise<string> {
const salt = await bcrypt.genSalt(12);
const hash = await bcrypt.hash(password, salt);
return hash;
}
// Interface-only (22 tokens):
// hashPassword(password: string): Promise<string>
// Hashes a password using bcrypt with salt rounds 12
Document section extraction:
For long documents (PRDs, technical specs, runbooks), extract only the relevant section rather than including the full document:
def extract_section(markdown_content: str, section_title: str) -> str:
"""Extract a specific section from a Markdown document."""
lines = markdown_content.split('\n')
in_section = False
section_lines = []
section_level = None
for line in lines:
if line.startswith('#') and section_title.lower() in line.lower():
in_section = True
section_level = len(line) - len(line.lstrip('#'))
section_lines.append(line)
elif in_section:
current_level = len(line) - len(line.lstrip('#')) if line.startswith('#') else None
if current_level and current_level <= section_level:
break # Hit a same-level or higher section
section_lines.append(line)
return '\n'.join(section_lines)
auth_section = extract_section(full_spec, "Authentication Flow")
Stack trace trimming:
Full stack traces are often 50–100 lines. The model needs the error type, message, and the top 5–10 frames (the application code frames, not framework internals):
def trim_stack_trace(trace: str, max_frames: int = 8) -> str:
"""Keep error header and first N frames, skip framework internals."""
lines = trace.strip().split('\n')
result = []
frame_count = 0
for line in lines:
if frame_count == 0 or 'File "' not in line or frame_count < max_frames:
result.append(line)
if 'File "' in line:
frame_count += 1
if frame_count > max_frames:
result.append(f"... ({frame_count - max_frames} more frames omitted)")
return '\n'.join(result)
Tip: Build a library of context extraction utilities specific to your tech stack and store them in a shared internal package. Common utilities: extract-function-by-name, extract-class-interface, trim-stack-trace, extract-markdown-section, extract-json-path. These utilities transform context selection from a manual judgment call into a repeatable, automatable process.
The Relevance Scoring Framework
When you have many potential context items and must choose which to include, relevance scoring provides a systematic method. This is particularly important in RAG (Retrieval-Augmented Generation) systems and complex agentic workflows.
Relevance scoring asks three questions about each candidate context item:
- Task relevance (0–3): How directly does this item relate to the current task? 3 = directly needed, 2 = background context, 1 = peripherally related, 0 = irrelevant
- Recency (0–2): How recently was this information established? 2 = established in this session, 1 = established in prior context, 0 = static/historical
- Uniqueness (0–2): Is this information derivable from other included context? 2 = unique information, 1 = partially overlaps, 0 = fully derivable from other items
Score = Task relevance × 2 + Recency + Uniqueness. Maximum score is 8. Items scoring 5+ are included. Items scoring 3–4 are included only if token budget allows. Items scoring 0–2 are excluded.
Example scoring for a bug fix task:
| Context item | Task rel. | Recency | Unique | Score | Decision |
|---|---|---|---|---|---|
| The buggy function source | 3 | 2 | 2 | 10 → cap at 8 | Include |
| The error message + stack trace | 3 | 2 | 2 | 10 → 8 | Include |
| The test that caught the bug | 3 | 1 | 1 | 8 | Include |
| Adjacent function in same file | 2 | 1 | 2 | 7 | Include |
| The entire module directory | 1 | 1 | 1 | 4 | Conditional |
| The deployment architecture | 0 | 0 | 2 | 2 | Exclude |
| The full application logs | 0 | 2 | 0 | 2 | Exclude |
Tip: You do not need to score items numerically every time — after doing this exercise explicitly for a few task types, you will internalize the scoring framework and apply it intuitively. The formal scoring is most useful when training new team members, when building automated context selection pipelines, or when debugging why an agent is performing poorly and you suspect context relevance as the cause.
Context Freshness and the Recency Problem
Context selection is not only a question of what to include — it is also a question of how current the included context is. Stale context is a subtle and costly problem: the model makes decisions based on information that is no longer true, leading to incorrect outputs that are expensive to detect and fix.
Common stale context problems:
Stale code context: The model has a version of a function from 2 turns ago, before you refactored it. It generates code that calls the old function signature.
Stale requirement context: A PR was merged that changed the acceptance criteria, but the persistent context block still contains the old requirements.
Stale decision context: An architectural decision log entry says "use Redis for caching" but three turns ago the team decided to switch to DynamoDB. New code is generated using Redis.
Context freshness strategies:
-
Source of truth anchoring. Always read context from its canonical source at the time of injection, never from memory or a cached copy. For code files, use
file_readtools with the actual file path rather than injecting a cached copy stored in your orchestration layer. -
Versioned context injection. Tag each context item with the turn number or timestamp when it was read. Flag items older than N turns as potentially stale and re-read them before use.
-
Decision log replacement semantics. When a decision changes, remove the old decision from the context block rather than appending the new one next to it. Conflicting context produces uncertain model behavior.
class SessionContext:
def __init__(self):
self.decisions: dict[str, tuple[str, int]] = {} # key -> (value, turn)
def set_decision(self, key: str, value: str, turn: int):
"""Replace any existing decision with the new one."""
self.decisions[key] = (value, turn)
def get_stale_decisions(self, current_turn: int, max_age: int = 10) -> list[str]:
"""Return keys of decisions older than max_age turns."""
return [
key for key, (_, turn) in self.decisions.items()
if (current_turn - turn) > max_age
]
def to_context_block(self) -> str:
lines = ["Decisions:"]
for key, (value, _) in self.decisions.items():
lines.append(f"- {key}: {value}")
return '\n'.join(lines)
Tip: Implement a "context staleness budget" in long-running agentic sessions. After N turns (typically 10–15 for coding sessions), trigger a context review step that re-reads all file-based context from disk, removes decisions that have been superseded, and reconfirms the session goal. This prevents the compounding error problem where stale context leads to incorrect work, which leads to correction steps, which add more context, all of which multiplies token cost.
Context Selection for Multi-Persona Workflows
Token optimization in teams that use AI across multiple roles (engineers, QA, PMs) requires persona-specific context selection. The same underlying project information is relevant to different personas in completely different ways.
Engineer context profile for a feature task:
- Relevant: Function interfaces being modified, test coverage for adjacent code, data models, API contracts
- Irrelevant: Business justification, market context, user persona descriptions
QA context profile for the same feature:
- Relevant: Acceptance criteria, edge cases in user stories, risk classification, existing test coverage gaps
- Irrelevant: Implementation details of unchanged code, developer's architectural rationale
PM context profile for the same feature:
- Relevant: User story, stakeholder notes, definition of done, sprint goal alignment
- Irrelevant: Function signatures, test file structures, deployment configuration
Building persona-specific context selection profiles — and enforcing them in your team's AI tooling configuration — prevents the common pattern of engineers sending architecture docs to a code-writing agent or PMs including full codebases in a user story refinement agent.
Tip: If your team uses a shared AI tooling platform (like a company-internal Claude or GPT integration), add persona detection to the context assembly pipeline. Based on the user's role (pulled from their identity provider), inject only the relevant context profile. This makes optimal context selection the default behavior rather than a discipline each person must remember to practice.