Hands-On: Codebase Context | Token Optimization Masterclass

This hands-on topic puts everything from Module 4 into practice. You will build a complete, production-ready codebase context pipeline from scratch — one that intelligently selects and assembles context for any AI task, enforces a token budget, and delivers consistently higher-quality AI responses than naive file inclusion.

By the end of this hands-on, you will have a working CLI tool called ctx that any engineer, QA analyst, or product manager on your team can run to generate optimized context for their AI sessions. The tool will combine repo maps, dependency tracing, RAG retrieval, and static context files into a unified pipeline.

Architecture Overview: What We Are Building

The pipeline has five components that execute in sequence:

Query / Task Description
        │
        ▼
┌─────────────────────┐
│  1. Static Layer    │  Always included: CLAUDE.md, architecture summary
│     (~200 tokens)   │  Token cost: fixed, minimal
└────────┬────────────┘
         │
         ▼
┌─────────────────────┐
│  2. Structural Layer│  Repo map filtered to relevant modules
│     (~500 tokens)   │  Token cost: bounded
└────────┬────────────┘
         │
         ▼
┌─────────────────────┐
│  3. RAG Layer       │  Semantically retrieved code chunks
│     (~2000 tokens)  │  Token cost: dynamic, query-driven
└────────┬────────────┘
         │
         ▼
┌─────────────────────┐
│  4. Dependency Layer│  Direct deps/dependents of target file
│     (~1500 tokens)  │  Token cost: dynamic, graph-driven
└────────┬────────────┘
         │
         ▼
┌─────────────────────┐
│  5. Target Layer    │  The specific file(s) being modified
│     (~2000 tokens)  │  Token cost: varies, but bounded
└────────┬────────────┘
         │
         ▼
    Budget Enforcer: Total ≤ 8000 tokens (configurable)
         │
         ▼
    Assembled Context (markdown) → AI Tool

Total budget: 8,000 tokens (adjustable to 4K for speed or 16K for complex tasks).
This compares to a typical naive inclusion of 50,000–100,000 tokens — a 6–12x reduction.

Tip: Design your pipeline's token budget as a configurable constant, not a hardcoded value. Different tasks legitimately need different budgets: a quick bug fix might only need 3K tokens, while reviewing an architectural change might warrant 20K. Name your budget presets (e.g., quick, standard, deep) and document what each is appropriate for.

Step 1: Project Setup

Create the project structure:

mkdir codebase-context-pipeline
cd codebase-context-pipeline

python3 -m venv .venv
source .venv/bin/activate

pip install \
    anthropic \
    openai \
    tiktoken \
    chromadb \
    langchain \
    langchain-openai \
    langchain-chroma \
    llama-index-core \
    llama-index-embeddings-openai \
    click \
    rich \
    pydantic

mkdir -p src/{layers,utils,config}
touch src/__init__.py
touch src/layers/__init__.py
touch src/utils/__init__.py
touch src/config/__init__.py
touch src/pipeline.py
touch src/cli.py

Project configuration:

from pydantic import BaseSettings
from pathlib import Path

class Settings(BaseSettings):
    # Model settings
    openai_api_key: str = ""
    anthropic_api_key: str = ""
    embedding_model: str = "text-embedding-3-large"
    embedding_dimensions: int = 1024
    llm_model: str = "claude-sonnet-4-5"

    # Token budgets
    budget_quick: int = 3000
    budget_standard: int = 8000
    budget_deep: int = 20000

    # Layer token allocations (as fractions of total budget)
    fraction_static: float = 0.05      # 5% — architecture overview
    fraction_structural: float = 0.15  # 15% — repo map
    fraction_rag: float = 0.35        # 35% — semantic retrieval
    fraction_deps: float = 0.20       # 20% — dependency graph
    fraction_target: float = 0.25     # 25% — target file content

    # Index settings
    index_dir: str = ".ctx-index"
    ignore_patterns: list[str] = [
        "node_modules", ".git", "dist", "build", ".next",
        "coverage", "__pycache__", "*.lock", "*.min.*",
        "*.generated.*", "migrations"
    ]

    class Config:
        env_file = ".env"

settings = Settings()

Tip: Store your pipeline's configuration in a .ctx.yaml or .env file at the project root and add it to .gitignore. Team-wide settings (budget presets, model choice) belong in a version-controlled ctx.config.yaml at the repo root. Personal overrides (API keys, personal budget preferences) belong in the gitignored local config.

Step 2: The Static Layer

The static layer always contributes CLAUDE.md and any pinned architecture documents. It is cheap (fixed, small) and provides the persistent foundation:

from pathlib import Path
from src.config.settings import settings
from src.utils.tokens import count_tokens, truncate_to_budget

STATIC_FILES = [
    "CLAUDE.md",
    "docs/architecture.md",
    "docs/conventions.md",
    ".cursorrules",           # fallback if CLAUDE.md not present
]

def build_static_layer(repo_root: str, token_budget: int) -> tuple[str, int]:
    """
    Build the static context layer from project context files.

    Returns:
        (content, tokens_used)
    """
    root = Path(repo_root)
    sections = []
    tokens_used = 0

    for filename in STATIC_FILES:
        filepath = root / filename
        if not filepath.exists():
            continue

        content = filepath.read_text()
        content_tokens = count_tokens(content)

        if tokens_used + content_tokens <= token_budget:
            sections.append(f"## Project Context ({filename})\n{content}")
            tokens_used += content_tokens
        else:
            # Truncate to fit within budget
            remaining = token_budget - tokens_used
            if remaining > 100:
                truncated = truncate_to_budget(content, remaining)
                sections.append(f"## Project Context ({filename}) [truncated]\n{truncated}")
                tokens_used += remaining
            break  # no more budget

    if not sections:
        # Minimal fallback if no context files exist
        return "## Project Context\nNo CLAUDE.md or context files found.\n", 20

    return "\n\n".join(sections), tokens_used

Step 3: The Structural Layer

The structural layer provides a filtered repo map focused on modules relevant to the current query:

import subprocess
from pathlib import Path
from src.utils.tokens import count_tokens, truncate_to_budget

def get_repo_map_from_aider(repo_root: str) -> str:
    """Generate repo map using aider's built-in map generator."""
    try:
        result = subprocess.run(
            ["aider", "--show-repo-map", "--no-git"],
            capture_output=True, text=True, cwd=repo_root, timeout=30
        )
        return result.stdout
    except (subprocess.SubprocessError, FileNotFoundError):
        return get_repo_map_from_ctags(repo_root)

def get_repo_map_from_ctags(repo_root: str) -> str:
    """Fallback: generate basic repo map using ctags."""
    try:
        result = subprocess.run(
            ["ctags", "-R", "--fields=+n", "--output-format=json", "-o", "-", "."],
            capture_output=True, text=True, cwd=repo_root, timeout=30
        )

        import json
        from collections import defaultdict

        by_file = defaultdict(list)
        for line in result.stdout.strip().split('\n'):
            try:
                tag = json.loads(line)
                if tag.get('kind') in ('function', 'class', 'method', 'interface'):
                    by_file[tag['path']].append(f"  {tag['kind']}: {tag['name']}")
            except json.JSONDecodeError:
                continue

        lines = []
        for path, symbols in sorted(by_file.items()):
            lines.append(f"\n{path}")
            lines.extend(symbols[:15])

        return '\n'.join(lines)
    except (subprocess.SubprocessError, FileNotFoundError):
        return get_repo_map_from_filesystem(repo_root)

def get_repo_map_from_filesystem(repo_root: str) -> str:
    """Last resort: generate a simple directory/file tree."""
    root = Path(repo_root)
    lines = ["Repository Structure:"]

    for path in sorted(root.rglob('*.{ts,py,java,go}')):
        skip = any(p in str(path) for p in ['node_modules', '.git', 'dist', '__pycache__'])
        if not skip:
            lines.append(f"  {path.relative_to(root)}")

    return '\n'.join(lines[:200])  # cap at 200 lines

def filter_map_by_relevance(repo_map: str, query: str) -> str:
    """
    Filter a repo map to show only files/symbols relevant to the query.
    Uses simple keyword matching — replace with embedding similarity for better results.
    """
    query_terms = set(query.lower().split())

    # Remove common stop words that add noise
    stop_words = {'the', 'a', 'an', 'is', 'in', 'to', 'of', 'and', 'or', 'how', 'what', 'why'}
    query_terms -= stop_words

    lines = repo_map.split('\n')
    relevant_lines = []
    current_file_relevant = False

    for line in lines:
        line_lower = line.lower()
        is_file_header = not line.startswith(' ') and line.strip()

        if is_file_header:
            # Check if this file is relevant to query
            current_file_relevant = any(term in line_lower for term in query_terms)
            if current_file_relevant:
                relevant_lines.append(line)
        elif current_file_relevant:
            relevant_lines.append(line)

    # If filtering was too aggressive, return original map truncated
    if len(relevant_lines) < 10:
        return '\n'.join(lines[:100])

    return '\n'.join(relevant_lines)

def build_structural_layer(repo_root: str, query: str, token_budget: int) -> tuple[str, int]:
    """Build the structural context layer."""

    repo_map = get_repo_map_from_aider(repo_root)
    filtered_map = filter_map_by_relevance(repo_map, query)

    content = f"## Codebase Structure (relevant to query)\n```\n{filtered_map}\n```"
    content = truncate_to_budget(content, token_budget)

    return content, count_tokens(content)

Step 4: The RAG Layer

The RAG layer retrieves semantically relevant code chunks using the index we build from the codebase:

from pathlib import Path
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter, Language
from src.config.settings import settings
from src.utils.tokens import count_tokens

EXTENSION_TO_LANGUAGE = {
    '.py': Language.PYTHON,
    '.ts': Language.TS,
    '.tsx': Language.TS,
    '.js': Language.JS,
    '.jsx': Language.JS,
    '.java': Language.JAVA,
    '.go': Language.GO,
}

def build_or_load_index(repo_root: str, force_rebuild: bool = False) -> Chroma:
    """Build the vector index or load it from disk if it exists."""

    embeddings = OpenAIEmbeddings(
        model=settings.embedding_model,
        dimensions=settings.embedding_dimensions
    )

    index_path = str(Path(repo_root) / settings.index_dir)

    if not force_rebuild:
        try:
            db = Chroma(persist_directory=index_path, embedding_function=embeddings)
            count = db._collection.count()
            if count > 0:
                return db
        except Exception:
            pass

    print("Building codebase index (first run)...")
    all_docs = []

    for filepath in Path(repo_root).rglob('*'):
        if filepath.suffix not in EXTENSION_TO_LANGUAGE:
            continue
        if any(skip in str(filepath) for skip in settings.ignore_patterns):
            continue

        try:
            content = filepath.read_text(encoding='utf-8', errors='ignore')
            if len(content) < 50:  # skip nearly empty files
                continue

            language = EXTENSION_TO_LANGUAGE[filepath.suffix]
            splitter = RecursiveCharacterTextSplitter.from_language(
                language=language,
                chunk_size=1200,
                chunk_overlap=80
            )

            docs = splitter.create_documents(
                texts=[content],
                metadatas=[{"source": str(filepath.relative_to(repo_root))}]
            )
            all_docs.extend(docs)

        except Exception as e:
            pass

    print(f"Indexing {len(all_docs)} chunks...")
    db = Chroma.from_documents(
        documents=all_docs,
        embedding=embeddings,
        persist_directory=index_path
    )

    print(f"Index ready: {len(all_docs)} chunks.")
    return db

def build_rag_layer(
    query: str,
    target_file: str | None,
    repo_root: str,
    token_budget: int
) -> tuple[str, int]:
    """
    Retrieve semantically relevant code chunks for the query.
    Excludes chunks from the target file (it will be included in the target layer).
    """

    db = build_or_load_index(repo_root)

    # Calculate how many chunks we can fit in the budget
    avg_chunk_tokens = 350  # typical chunk size
    max_chunks = max(2, token_budget // avg_chunk_tokens)

    # Retrieve using MMR for diversity
    retriever = db.as_retriever(
        search_type="mmr",
        search_kwargs={"k": max_chunks, "fetch_k": max_chunks * 4}
    )

    results = retriever.invoke(query)

    # Filter out chunks from the target file (already included elsewhere)
    if target_file:
        target_rel = str(Path(target_file).relative_to(repo_root))
        results = [r for r in results if r.metadata.get('source') != target_rel]

    sections = ["## Semantically Relevant Code\n"]
    tokens_used = 20

    for doc in results:
        chunk_section = f"\n### {doc.metadata.get('source', 'unknown')}\n```\n{doc.page_content}\n```\n"
        chunk_tokens = count_tokens(chunk_section)

        if tokens_used + chunk_tokens <= token_budget:
            sections.append(chunk_section)
            tokens_used += chunk_tokens
        else:
            break

    return ''.join(sections), tokens_used

Step 5: The Dependency and Target Layers

from pathlib import Path
import re
from src.utils.tokens import count_tokens, truncate_to_budget

def extract_imports(file_path: str, repo_root: str) -> list[str]:
    """Extract relative imports from a file and resolve them to actual paths."""
    content = Path(file_path).read_text()
    suffix = Path(file_path).suffix
    resolved = []

    if suffix in ('.ts', '.tsx', '.js', '.jsx'):
        pattern = re.compile(r"from\s+['\"](\./[^'\"]+|@/[^'\"]+)['\"]")
    elif suffix == '.py':
        pattern = re.compile(r"from\s+\.(\w+)\s+import|import\s+\.(\w+)")
    else:
        return []

    for match in pattern.finditer(content):
        import_path = (match.group(1) or match.group(2) or '').strip()
        if import_path.startswith('./') or import_path.startswith('../'):
            base = Path(file_path).parent
            candidate = (base / import_path).resolve()
            for ext in ['', '.ts', '.tsx', '.js', '/index.ts', '/index.js', '.py']:
                full = Path(str(candidate) + ext)
                if full.exists() and full.is_file():
                    try:
                        resolved.append(str(full.relative_to(repo_root)))
                    except ValueError:
                        pass
                    break

    return resolved

def build_dependency_layer(
    target_file: str,
    repo_root: str,
    token_budget: int
) -> tuple[str, int]:
    """Include direct imports of the target file as signature-only context."""

    if not target_file or not Path(target_file).exists():
        return "", 0

    imports = extract_imports(target_file, repo_root)
    sections = ["## Direct Dependencies (signatures)\n"]
    tokens_used = 30

    for import_path in imports[:8]:  # cap at 8 direct imports
        full_path = Path(repo_root) / import_path
        if not full_path.exists():
            continue

        content = full_path.read_text()
        # Include only the first 40 lines (signatures and class definitions)
        sig_lines = content.split('\n')[:40]
        sig_content = '\n'.join(sig_lines)

        section = f"\n### {import_path} (signatures)\n```\n{sig_content}\n```\n"
        section_tokens = count_tokens(section)

        if tokens_used + section_tokens <= token_budget:
            sections.append(section)
            tokens_used += section_tokens

    return ''.join(sections), tokens_used


def build_target_layer(
    target_file: str | None,
    repo_root: str,
    token_budget: int
) -> tuple[str, int]:
    """Include the full content of the target file."""

    if not target_file:
        return "", 0

    path = Path(target_file)
    if not path.exists():
        return f"## Target File\nFile not found: {target_file}\n", 50

    content = path.read_text()
    file_tokens = count_tokens(content)

    if file_tokens <= token_budget:
        section = f"## Target File: {path.name}\n```\n{content}\n```\n"
        return section, file_tokens
    else:
        # Truncate intelligently: keep the beginning and end
        half_budget = token_budget // 2
        lines = content.split('\n')

        # Keep imports + first third
        head_lines = lines[:len(lines) // 3]
        tail_lines = lines[-20:]  # always keep the last 20 lines

        truncated = '\n'.join(head_lines) + '\n\n... [truncated for token budget] ...\n\n' + '\n'.join(tail_lines)
        section = f"## Target File: {path.name} [truncated]\n```\n{truncated}\n```\n"
        return section, count_tokens(section)

Step 6: Token Utilities

import tiktoken

_encoder = None

def get_encoder():
    global _encoder
    if _encoder is None:
        _encoder = tiktoken.get_encoding("cl100k_base")  # works for GPT-4 and Claude
    return _encoder

def count_tokens(text: str) -> int:
    return len(get_encoder().encode(text))

def truncate_to_budget(text: str, token_budget: int) -> str:
    """Truncate text to fit within a token budget."""
    encoder = get_encoder()
    tokens = encoder.encode(text)
    if len(tokens) <= token_budget:
        return text
    truncated_tokens = tokens[:token_budget - 5]  # -5 for safety margin
    return encoder.decode(truncated_tokens) + "\n... [truncated]"

def tokens_to_cost_usd(input_tokens: int, output_tokens: int, model: str = "gpt-4o") -> float:
    """Estimate API cost in USD."""
    pricing = {
        "gpt-4o": {"input": 2.50, "output": 10.00},
        "gpt-4o-mini": {"input": 0.15, "output": 0.60},
        "claude-sonnet-4-5": {"input": 3.00, "output": 15.00},
        "claude-haiku-3-5": {"input": 0.80, "output": 4.00},
    }
    rates = pricing.get(model, {"input": 3.00, "output": 15.00})
    return (input_tokens / 1_000_000 * rates["input"]) + (output_tokens / 1_000_000 * rates["output"])

Step 7: The Pipeline Orchestrator

from pathlib import Path
from rich.console import Console
from rich.table import Table

from src.config.settings import settings
from src.layers.static_layer import build_static_layer
from src.layers.structural_layer import build_structural_layer
from src.layers.rag_layer import build_rag_layer, build_or_load_index
from src.layers.dependency_layer import build_dependency_layer
from src.layers.target_layer import build_target_layer
from src.utils.tokens import count_tokens, tokens_to_cost_usd

console = Console()

BUDGET_PRESETS = {
    "quick":    settings.budget_quick,
    "standard": settings.budget_standard,
    "deep":     settings.budget_deep,
}

def build_context(
    query: str,
    repo_root: str,
    target_file: str | None = None,
    budget_preset: str = "standard",
    output_format: str = "markdown"
) -> dict:
    """
    Build an optimized context for the given query and target file.

    Returns:
        dict with 'context' (str), 'token_counts' (dict), 'total_tokens' (int)
    """

    total_budget = BUDGET_PRESETS.get(budget_preset, settings.budget_standard)

    # Allocate budgets per layer
    budgets = {
        "static":     int(total_budget * settings.fraction_static),
        "structural": int(total_budget * settings.fraction_structural),
        "rag":        int(total_budget * settings.fraction_rag),
        "deps":       int(total_budget * settings.fraction_deps),
        "target":     int(total_budget * settings.fraction_target),
    }

    console.print(f"\n[bold cyan]Building context[/bold cyan] (budget: {total_budget} tokens, preset: {budget_preset})")

    layers = []
    token_counts = {}

    # Layer 1: Static context
    with console.status("Loading static context (CLAUDE.md)..."):
        static_content, static_tokens = build_static_layer(repo_root, budgets["static"])
        layers.append(static_content)
        token_counts["static"] = static_tokens

    # Layer 2: Structural context (repo map)
    with console.status("Building structural context (repo map)..."):
        structural_content, structural_tokens = build_structural_layer(
            repo_root, query, budgets["structural"]
        )
        layers.append(structural_content)
        token_counts["structural"] = structural_tokens

    # Layer 3: RAG retrieval
    with console.status("Retrieving semantically relevant code..."):
        rag_content, rag_tokens = build_rag_layer(
            query, target_file, repo_root, budgets["rag"]
        )
        layers.append(rag_content)
        token_counts["rag"] = rag_tokens

    # Layer 4: Dependencies
    if target_file:
        with console.status("Tracing dependencies..."):
            dep_content, dep_tokens = build_dependency_layer(
                target_file, repo_root, budgets["deps"]
            )
            layers.append(dep_content)
            token_counts["deps"] = dep_tokens
    else:
        token_counts["deps"] = 0

    # Layer 5: Target file
    if target_file:
        with console.status("Loading target file..."):
            target_content, target_tokens = build_target_layer(
                target_file, repo_root, budgets["target"]
            )
            layers.append(target_content)
            token_counts["target"] = target_tokens
    else:
        token_counts["target"] = 0

    # Assemble final context
    header = f"# AI Context — {query[:80]}...\n\n" if len(query) > 80 else f"# AI Context — {query}\n\n"
    context = header + "\n\n---\n\n".join(filter(None, layers))

    total_tokens = count_tokens(context)
    estimated_cost = tokens_to_cost_usd(total_tokens, 1000)  # assume ~1K output tokens

    # Display summary table
    table = Table(title="Context Summary", show_header=True)
    table.add_column("Layer", style="cyan")
    table.add_column("Tokens", justify="right")
    table.add_column("Budget", justify="right")
    table.add_column("Usage", justify="right")

    for layer, tokens in token_counts.items():
        budget = budgets.get(layer, 0)
        usage = f"{tokens/budget*100:.0f}%" if budget > 0 else "N/A"
        table.add_row(layer, str(tokens), str(budget), usage)

    table.add_row("[bold]TOTAL[/bold]", f"[bold]{total_tokens}[/bold]", 
                  f"[bold]{total_budget}[/bold]", 
                  f"[bold]{total_tokens/total_budget*100:.0f}%[/bold]")

    console.print(table)
    console.print(f"Estimated cost per query: [green]${estimated_cost:.4f}[/green]")

    return {
        "context": context,
        "token_counts": token_counts,
        "total_tokens": total_tokens,
        "estimated_cost_usd": estimated_cost
    }

Step 8: The CLI Interface

import click
import subprocess
import sys
from pathlib import Path
from rich.console import Console
from src.pipeline import build_context, BUDGET_PRESETS
from src.layers.rag_layer import build_or_load_index

console = Console()

@click.group()
def cli():
    """ctx — Codebase Context Pipeline for AI-assisted development."""
    pass

@cli.command()
@click.argument('query')
@click.option('--file', '-f', default=None, help='Target file you are working on')
@click.option('--repo', '-r', default='.', help='Repository root (default: current directory)')
@click.option('--budget', '-b', default='standard', 
              type=click.Choice(['quick', 'standard', 'deep']),
              help='Token budget preset')
@click.option('--output', '-o', default=None, help='Write context to file instead of stdout')
@click.option('--copy', '-c', is_flag=True, help='Copy to clipboard (macOS/Linux)')
def get(query, file, repo, budget, output, copy):
    """Get optimized context for a query.

    Examples:

    \b
    # Engineer: refactoring a specific file
    ctx get "refactor UserService to support soft deletes" -f src/services/UserService.ts

    \b  
    # QA: writing tests for a feature
    ctx get "write integration tests for the checkout payment flow" -b deep

    \b
    # PM: understanding a feature
    ctx get "how does order fulfillment work end to end" -b quick
    """

    repo_root = str(Path(repo).resolve())
    target_file = str(Path(file).resolve()) if file else None

    result = build_context(
        query=query,
        repo_root=repo_root,
        target_file=target_file,
        budget_preset=budget
    )

    context = result["context"]

    if output:
        Path(output).write_text(context)
        console.print(f"\n[green]Context written to {output}[/green]")
    elif copy:
        try:
            subprocess.run(['pbcopy'], input=context.encode(), check=True)  # macOS
            console.print(f"\n[green]Context copied to clipboard ({result['total_tokens']} tokens)[/green]")
        except FileNotFoundError:
            try:
                subprocess.run(['xclip', '-selection', 'clipboard'], 
                              input=context.encode(), check=True)  # Linux
                console.print(f"\n[green]Context copied to clipboard[/green]")
            except FileNotFoundError:
                console.print("[yellow]Clipboard copy not available — printing to stdout[/yellow]")
                click.echo(context)
    else:
        click.echo(context)

@cli.command()
@click.option('--repo', '-r', default='.', help='Repository root')
@click.option('--force', is_flag=True, help='Force rebuild even if index exists')
def index(repo, force):
    """Build or rebuild the codebase vector index."""
    repo_root = str(Path(repo).resolve())
    console.print(f"Building index for: {repo_root}")
    build_or_load_index(repo_root, force_rebuild=force)
    console.print("[green]Index ready.[/green]")

@cli.command()
@click.option('--repo', '-r', default='.', help='Repository root')
def benchmark(repo):
    """Compare token usage: naive inclusion vs. optimized pipeline."""
    from src.utils.tokens import count_tokens

    repo_root = str(Path(repo).resolve())

    # Count naive inclusion (all source files)
    naive_tokens = 0
    file_count = 0
    for path in Path(repo_root).rglob('*.{ts,py,js,java}'):
        if any(skip in str(path) for skip in ['node_modules', '.git', 'dist']):
            continue
        try:
            naive_tokens += count_tokens(path.read_text())
            file_count += 1
        except Exception:
            pass

    # Run optimized pipeline
    result = build_context(
        query="refactor the main service class",
        repo_root=repo_root,
        budget_preset="standard"
    )

    optimized_tokens = result["total_tokens"]
    reduction = (1 - optimized_tokens / naive_tokens) * 100 if naive_tokens > 0 else 0

    console.print(f"\n[bold]Benchmark Results[/bold]")
    console.print(f"Naive (all {file_count} source files): {naive_tokens:,} tokens")
    console.print(f"Optimized pipeline:                    {optimized_tokens:,} tokens")
    console.print(f"Reduction:                             [green]{reduction:.1f}%[/green]")
    console.print(f"Cost savings per query (approx):       [green]${(naive_tokens - optimized_tokens) / 1_000_000 * 3:.4f}[/green]")

def main():
    cli()

if __name__ == '__main__':
    main()

Step 9: Running the Pipeline and Validating Results

Install and test the tool:

pip install -e .

export PATH="$PATH:$(pwd)"

ctx index --repo /path/to/your/project

ctx get "add input validation to the user registration endpoint" \
    --file src/controllers/users.ts \
    --budget standard \
    --copy

ctx get "what are the edge cases in payment processing I should test" \
    --budget deep \
    --output /tmp/qa-context.md

ctx get "explain how the discount code system works" \
    --budget quick

ctx benchmark --repo /path/to/your/project

Expected benchmark output for a medium-sized project:

Building context (budget: 8000 tokens, preset: standard)

╭─────────────────────────────────────────────────────╮
│              Context Summary                         │
├────────────┬────────┬────────┬────────────────────────┤
│ Layer      │ Tokens │ Budget │ Usage                  │
├────────────┼────────┼────────┼────────────────────────┤
│ static     │    487 │    400 │ 122% (CLAUDE.md is big)│
│ structural │   1102 │   1200 │ 92%                    │
│ rag        │   2687 │   2800 │ 96%                    │
│ deps       │   1554 │   1600 │ 97%                    │
│ target     │   1892 │   2000 │ 95%                    │
│ TOTAL      │   7722 │   8000 │ 97%                    │
╰────────────┴────────┴────────┴────────────────────────╯
Estimated cost per query: $0.0232

Benchmark Results
Naive (all 312 source files): 187,450 tokens
Optimized pipeline:              7,722 tokens
Reduction:                        95.9%
Cost savings per query (approx):  $0.5389

Tip: Run the benchmark at sprint boundaries to track how your optimization improves as you tune the pipeline. The first run establishes your baseline. Over several iterations of tuning (adjusting layer fractions, improving the RAG index, refining CLAUDE.md), you will see both the token reduction percentage and the response quality improve simultaneously — confirming that the quality-cost relationship is genuinely virtuous, not a trade-off.

Step 10: Integrating with Your Team's AI Workflow

With the tool working, integrate it into everyday workflows:

Git hook integration:

#!/bin/bash
CHANGED_FILES=$(git diff --cached --name-only | head -5 | tr '\n' ' ')
if [ -n "$CHANGED_FILES" ]; then
    echo "# Changed files: $CHANGED_FILES" >> "$1"
    echo "# Run: ctx get 'describe these changes' --copy" >> "$1"
fi

VS Code task integration:

// .vscode/tasks.json
{
  "version": "2.0.0",
  "tasks": [
    {
      "label": "Get AI Context (current file)",
      "type": "shell",
      "command": "ctx get '${input:query}' --file ${file} --copy",
      "presentation": { "reveal": "always", "panel": "shared" }
    }
  ],
  "inputs": [
    {
      "id": "query",
      "description": "What do you want to do?",
      "type": "promptString"
    }
  ]
}

Shell alias for quick access:

alias ai-ctx='ctx get'
alias ai-idx='ctx index'

function ai-edit() {
    local query="${1:-implement the TODO comments}"
    local file="${2:-$(git diff --name-only HEAD | head -1)}"
    ctx get "$query" --file "$file" --copy
    echo "Context for '$file' copied — paste into your AI tool."
}

Tip: The most successful adoption pattern for this tool is starting with one high-frequency pain point — usually the query type that most often produces wrong or incomplete AI responses. Get that one query type working excellently with the pipeline, then let word of mouth within the team drive broader adoption. Trying to roll out the tool for all use cases simultaneously leads to inconsistent experiences and slower adoption.