Targeted File Inclusion | Token Optimization Masterclass

Once you understand that structural context (repo maps, summaries) is cheaper and often sufficient for orientation, the question becomes: when you do need to include actual file content, how do you select precisely the right files? Dependency analysis answers this question systematically. Instead of human intuition or brute-force inclusion, dependency analysis traces the actual graph of what a piece of code needs — and includes only that subgraph.

This topic covers dependency analysis techniques, call graph traversal, import tracing, test-to-code mapping, and change-impact analysis — all applied to the problem of selecting the minimum relevant file set for any AI task.

The Core Concept: Dependency Graphs as Context Selectors

Every codebase has an implicit dependency graph. When module A imports from module B, there is a directed edge from A to B. When function processOrder calls validateInventory, there is a call-graph edge. When a test file covers UserService, there is a coverage edge.

These graphs encode relevance. When you are working on UserService.ts, the files most relevant to that task are:
1. UserService.ts itself (the target)
2. Files that UserService.ts imports (its dependencies)
3. Files that import UserService.ts (its dependents — they constrain how you can change the interface)
4. Test files that cover UserService.ts (they define what "correct" means)
5. Shared types referenced by UserService.ts

This is far more precise than "all files in the users module" or "all TypeScript files in src/services". Dependency analysis operationalizes this precision.

The key insight: For any given AI task, you can derive the relevant file set mechanically from the dependency graph, without requiring manual selection or complete codebase knowledge.

Tip: Think of dependency analysis as building a "relevance radius" around your target file. Radius 0 is just the target file. Radius 1 adds direct imports and importers. Radius 2 adds their dependencies. Most tasks need radius 1–2; very few need radius 3 or higher. Enforcing a maximum radius prevents context explosion.

Static Import Analysis: Tracing Dependencies at Build Time

Static analysis reads source files without executing them and builds the dependency graph from import statements. This is the fastest and most reliable approach for compiled languages and module-system-based JavaScript/TypeScript.

TypeScript/JavaScript import tracing:

import re
from pathlib import Path
from collections import defaultdict

class ImportTracer:
    """
    Traces static import dependencies in a TypeScript/JavaScript codebase.
    Builds a bidirectional dependency graph for context selection.
    """

    # Matches: import ... from '...', require('...'), dynamic import('...')
    IMPORT_PATTERN = re.compile(
        r'''(?:import\s+.*?\s+from\s+['"]([^'"]+)['"]|'''
        r'''require\s*\(\s*['"]([^'"]+)['"]\s*\)|'''
        r'''import\s*\(\s*['"]([^'"]+)['"]\s*\))''',
        re.MULTILINE | re.DOTALL
    )

    def __init__(self, root: str):
        self.root = Path(root)
        self.deps: dict[str, set[str]] = defaultdict(set)   # file → what it imports
        self.rdeps: dict[str, set[str]] = defaultdict(set)  # file → what imports it
        self._build_graph()

    def _resolve_import(self, source_file: Path, import_path: str) -> str | None:
        """Resolve a relative import to an absolute path within the project."""
        if import_path.startswith('.'):
            # Relative import
            candidate = (source_file.parent / import_path).resolve()
            for ext in ['', '.ts', '.tsx', '.js', '/index.ts', '/index.js']:
                full = Path(str(candidate) + ext)
                if full.exists() and full.is_file():
                    return str(full.relative_to(self.root))
        # Absolute/package imports are external — skip them
        return None

    def _build_graph(self):
        for file in self.root.rglob('*.ts'):
            if any(skip in str(file) for skip in ['node_modules', '.git', 'dist', '.next']):
                continue

            rel_path = str(file.relative_to(self.root))
            content = file.read_text()

            for match in self.IMPORT_PATTERN.finditer(content):
                import_path = match.group(1) or match.group(2) or match.group(3)
                if import_path:
                    resolved = self._resolve_import(file, import_path)
                    if resolved:
                        self.deps[rel_path].add(resolved)
                        self.rdeps[resolved].add(rel_path)

    def get_context_files(
        self, 
        target: str, 
        depth: int = 2,
        include_dependents: bool = True,
        max_files: int = 15
    ) -> list[str]:
        """
        Get the minimal relevant file set for working on `target`.

        Args:
            target: The file being modified/analyzed
            depth: How many dependency hops to follow
            include_dependents: Whether to include files that import target
            max_files: Hard cap on context size
        """
        visited = set()
        queue = [(target, 0)]

        while queue:
            current, d = queue.pop(0)
            if current in visited or d > depth:
                continue
            visited.add(current)

            # Add dependencies (what this file imports)
            for dep in self.deps.get(current, set()):
                if dep not in visited:
                    queue.append((dep, d + 1))

            # Add dependents (what imports this file) — only from direct level
            if include_dependents and d == 0:
                for rdep in self.rdeps.get(current, set()):
                    if rdep not in visited:
                        queue.append((rdep, 1))

        visited.discard(target)  # target is given separately
        return [target] + sorted(visited)[:max_files - 1]


tracer = ImportTracer('./src')
context_files = tracer.get_context_files(
    'services/UserService.ts', 
    depth=1,
    include_dependents=True
)

print("Files selected for context:")
for f in context_files:
    print(f"  {f}")

Integrating with aider:

TARGET="src/services/UserService.ts"
CONTEXT_FILES=$(python3 scripts/trace-deps.py "$TARGET" --depth 1 --max 8)

aider $CONTEXT_FILES --message "Refactor UserService to support soft deletes"

Tip: Cache your import graph after building it. For a 500-file TypeScript project, building the graph from scratch takes 2–5 seconds. Storing it as a JSON file and invalidating only on file changes reduces this to milliseconds. Use file modification timestamps to detect staleness.

Call Graph Analysis: Function-Level Precision

Import tracing gives you file-level precision. Call graph analysis goes deeper to function-level precision — identifying not just which files are relevant, but which functions within those files are relevant to your task.

This is especially valuable when a file has 20 functions but the task only touches 3 of them, and you want to include only those 3 functions (plus their callees) rather than the entire file.

Python call graph with ast module:

import ast
from pathlib import Path
from collections import defaultdict

class CallGraphAnalyzer(ast.NodeVisitor):
    """Builds a function-level call graph for Python codebases."""

    def __init__(self):
        self.calls: dict[str, set[str]] = defaultdict(set)
        self.current_function = None

    def visit_FunctionDef(self, node):
        old_func = self.current_function
        self.current_function = node.name
        self.generic_visit(node)
        self.current_function = old_func

    visit_AsyncFunctionDef = visit_FunctionDef

    def visit_Call(self, node):
        if self.current_function:
            if isinstance(node.func, ast.Name):
                self.calls[self.current_function].add(node.func.id)
            elif isinstance(node.func, ast.Attribute):
                self.calls[self.current_function].add(
                    f"{node.func.attr}"
                )
        self.generic_visit(node)

def get_function_context(filepath: str, function_name: str, depth: int = 2) -> dict:
    """
    Extract a function and all its direct/indirect callees from a Python file.
    Returns a dict of {function_name: source_code}.
    """
    source = Path(filepath).read_text()
    tree = ast.parse(source)

    # Build the call graph for this file
    analyzer = CallGraphAnalyzer()
    analyzer.visit(tree)

    # BFS to find all relevant functions
    relevant = set()
    queue = [function_name]
    for _ in range(depth):
        next_queue = []
        for func in queue:
            if func not in relevant:
                relevant.add(func)
                next_queue.extend(analyzer.calls.get(func, set()))
        queue = next_queue

    # Extract source for relevant functions
    result = {}
    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
            if node.name in relevant:
                # Get source lines for this function
                func_lines = source.split('\n')[node.lineno - 1:node.end_lineno]
                result[node.name] = '\n'.join(func_lines)

    return result


context = get_function_context('services/order_service.py', 'process_checkout', depth=2)
total_tokens = sum(len(code.split()) * 1.3 for code in context.values())

print(f"Including {len(context)} functions, ~{total_tokens:.0f} tokens")
for func_name, code in context.items():
    print(f"\n--- {func_name} ---\n{code[:200]}...")

Tip: For QA engineers writing test cases with AI assistance, call graph analysis is particularly powerful. Provide the function under test plus only the functions it calls directly — this gives the AI exactly the right context to write meaningful unit tests without exposing unrelated business logic.

Test-to-Code Mapping: Including Tests That Define Correctness

When asking an AI to modify or refactor code, including the existing tests for that code is one of the highest-value context additions. Tests are a compact, high-precision specification of expected behavior — often more informative than comments or documentation.

But test suites can be large. Test-to-code mapping selects only the tests that cover your target file or function.

Using Jest coverage data:

npx jest --coverage --coverageReporters=json --coverageDirectory=./tmp/coverage

python3 << 'EOF'
import json
import subprocess
from pathlib import Path

def find_tests_for_file(target_source: str, coverage_dir: str = './tmp/coverage') -> list[str]:
    """Find test files that cover lines in target_source."""

    # Read Jest's coverage summary
    coverage_file = Path(coverage_dir) / 'coverage-final.json'
    with open(coverage_file) as f:
        coverage = json.load(f)

    # Find the target file in coverage data
    target_abs = str(Path(target_source).resolve())
    if target_abs not in coverage:
        print(f"No coverage data for {target_source}")
        return []

    # Run jest with --verbose to see which test files exercise this source
    result = subprocess.run(
        ['npx', 'jest', '--listTests', '--findRelatedTests', target_source],
        capture_output=True, text=True
    )

    test_files = [line.strip() for line in result.stdout.split('\n') if line.strip()]
    return test_files

tests = find_tests_for_file('src/services/UserService.ts')
print(f"Tests that cover UserService.ts:")
for t in tests:
    print(f"  {t}")
EOF

Integrating test context into AI prompts:

def build_refactoring_context(
    target_file: str,
    tracer: ImportTracer,
    max_tokens: int = 10000
) -> str:
    """
    Build context for a refactoring task:
    - Target file (full content)
    - Direct dependencies (signatures only)
    - Tests covering the target (full content)
    """
    sections = []
    token_budget = max_tokens

    # 1. Target file (full content)
    target_content = Path(target_file).read_text()
    sections.append(f"## Target File: {target_file}\n```typescript\n{target_content}\n```")
    token_budget -= estimate_tokens(target_content)

    # 2. Tests for this file
    test_files = find_tests_for_file(target_file)
    for test_file in test_files[:2]:  # limit to 2 test files
        test_content = Path(test_file).read_text()
        test_tokens = estimate_tokens(test_content)
        if test_tokens <= token_budget:
            sections.append(
                f"## Test File: {test_file}\n```typescript\n{test_content}\n```"
            )
            token_budget -= test_tokens

    # 3. Direct dependency signatures only (not full content)
    deps = tracer.deps.get(target_file, set())
    dep_signatures = []
    for dep in list(deps)[:5]:
        sigs = extract_typescript_signatures(str(Path('src') / dep))
        dep_signatures.append(f"\n{dep}:\n" + "\n".join(sigs))

    if dep_signatures:
        sections.append("## Dependency Signatures\n" + "\n".join(dep_signatures))

    return "\n\n".join(sections)

Tip: When asking AI to help write or update tests, always include the implementation file in full AND the existing test file if one exists. Never include other test files for different components — they pollute the testing patterns the model will use. Focused test context is the difference between getting idiomatic, consistent tests and getting tests that mix patterns from across the codebase.

Change-Impact Analysis: Finding What Will Break

A specialized form of dependency analysis, change-impact analysis works in reverse: given a change you are about to make, which files are likely to be affected? This is crucial for:

Knowing which tests to run
Knowing which files to include when asking "does my change break anything?"
Generating comprehensive PR descriptions

import subprocess
import json
from typing import NamedTuple

class ImpactedFile(NamedTuple):
    path: str
    relationship: str  # 'direct_importer', 'indirect_importer', 'test'
    confidence: float  # 0.0 to 1.0

def analyze_change_impact(
    changed_files: list[str],
    tracer: ImportTracer,
    max_depth: int = 3
) -> list[ImpactedFile]:
    """
    Given a list of changed files, find all files that may be impacted.
    Useful for building targeted test runs and review contexts.
    """
    impacted = []
    seen = set(changed_files)

    for target in changed_files:
        # Direct importers — highest confidence of impact
        for importer in tracer.rdeps.get(target, set()):
            if importer not in seen:
                impacted.append(ImpactedFile(importer, 'direct_importer', 0.9))
                seen.add(importer)

        # Indirect importers — medium confidence
        indirect = set()
        for importer in list(tracer.rdeps.get(target, set())):
            for indirect_importer in tracer.rdeps.get(importer, set()):
                if indirect_importer not in seen:
                    indirect.add(indirect_importer)
                    seen.add(indirect_importer)

        for f in indirect:
            impacted.append(ImpactedFile(f, 'indirect_importer', 0.6))

    # Sort by confidence descending
    return sorted(impacted, key=lambda x: x.confidence, reverse=True)


git_diff = subprocess.run(
    ['git', 'diff', '--name-only', 'HEAD~1'],
    capture_output=True, text=True
).stdout.strip().split('\n')

impact = analyze_change_impact(git_diff, tracer)
high_confidence = [f for f in impact if f.confidence >= 0.8]

print(f"High-impact files to review ({len(high_confidence)}):")
for f in high_confidence:
    print(f"  {f.path} ({f.relationship})")

Using this with AI code review:

def build_pr_review_context(changed_files: list[str], tracer: ImportTracer) -> str:
    """Build context for an AI PR review that includes impacted files."""

    impact = analyze_change_impact(changed_files, tracer)
    high_impact = [f for f in impact if f.confidence >= 0.8]

    context = "## Changed Files\n"
    for f in changed_files:
        diff = get_git_diff(f)  # get the actual diff
        context += f"\n### {f}\n```diff\n{diff}\n```\n"

    if high_impact:
        context += "\n## Potentially Impacted Files (review for breakage)\n"
        for impacted in high_impact[:5]:
            context += f"- `{impacted.path}` ({impacted.relationship})\n"

    return context

Tip: For QA engineers, change-impact analysis is a powerful tool for generating targeted regression test plans. Feed the output of analyze_change_impact into a prompt like: "Given these changed and impacted files, which test scenarios should be prioritized in regression testing?" The AI will produce a focused, evidence-based test plan rather than a generic one.

Practical Integration: Dependency-Aware Context in Your Workflow

Bringing together import tracing, call graphs, and test mapping into a cohesive daily workflow:

For engineers in Cursor:

Create a .cursor/context-builder.py script that you run before starting work on a file:

#!/usr/bin/env python3
"""
Usage: python .cursor/context-builder.py src/services/UserService.ts
Generates a focused context file for the given target.
"""
import sys
from pathlib import Path

target = sys.argv[1]
tracer = ImportTracer('./src')

files = tracer.get_context_files(target, depth=1, max_files=8)
tests = find_tests_for_file(target)

output = f"""# Context for {target}

## Files to include in Cursor context:
{chr(10).join(f'- @{f}' for f in files)}

## Test files:
{chr(10).join(f'- @{t}' for t in tests[:2])}

## Dependency summary:
- Direct imports: {len(tracer.deps.get(target, []))} files
- Direct importers: {len(tracer.rdeps.get(target, []))} files
"""

print(output)
import subprocess
subprocess.run(['pbcopy'], input=output.encode())
print("\n(Copied to clipboard — paste into Cursor context)")

For engineers using Claude Code:

TARGET_FILE=$1
CONTEXT_FILES=$(python3 scripts/trace-deps.py "$TARGET_FILE" --format claude)

echo "Suggested @-mentions for this session:"
echo "$CONTEXT_FILES"

For product managers using AI for codebase analysis:

Product managers often need to understand which parts of the codebase will be affected by a requested feature. A change-impact analysis tool surfaced through a simple CLI gives PMs visibility without requiring deep technical knowledge:

echo "Analyzing impact of changes to: $1"
python3 scripts/change-impact.py "$1" \
    --format summary \
    --output "docs/impact-analysis-$(date +%Y%m%d).md"
echo "Impact analysis written to docs/"

Tip: Package your dependency analysis tools as a team-shared CLI utility (e.g., npm run context -- src/services/UserService.ts) that any team member can run without understanding the underlying implementation. When everyone on the team — engineers, QA, and technically-curious PMs — can generate targeted context with one command, the whole team's AI usage quality improves.