This hands-on topic puts everything from Module 4 into practice. You will build a complete, production-ready codebase context pipeline from scratch — one that intelligently selects and assembles context for any AI task, enforces a token budget, and delivers consistently higher-quality AI responses than naive file inclusion.
By the end of this hands-on, you will have a working CLI tool called ctx that any engineer, QA analyst, or product manager on your team can run to generate optimized context for their AI sessions. The tool will combine repo maps, dependency tracing, RAG retrieval, and static context files into a unified pipeline.
Architecture Overview: What We Are Building
The pipeline has five components that execute in sequence:
Query / Task Description
│
▼
┌─────────────────────┐
│ 1. Static Layer │ Always included: CLAUDE.md, architecture summary
│ (~200 tokens) │ Token cost: fixed, minimal
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ 2. Structural Layer│ Repo map filtered to relevant modules
│ (~500 tokens) │ Token cost: bounded
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ 3. RAG Layer │ Semantically retrieved code chunks
│ (~2000 tokens) │ Token cost: dynamic, query-driven
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ 4. Dependency Layer│ Direct deps/dependents of target file
│ (~1500 tokens) │ Token cost: dynamic, graph-driven
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ 5. Target Layer │ The specific file(s) being modified
│ (~2000 tokens) │ Token cost: varies, but bounded
└────────┬────────────┘
│
▼
Budget Enforcer: Total ≤ 8000 tokens (configurable)
│
▼
Assembled Context (markdown) → AI Tool
Total budget: 8,000 tokens (adjustable to 4K for speed or 16K for complex tasks).
This compares to a typical naive inclusion of 50,000–100,000 tokens — a 6–12x reduction.
Tip: Design your pipeline's token budget as a configurable constant, not a hardcoded value. Different tasks legitimately need different budgets: a quick bug fix might only need 3K tokens, while reviewing an architectural change might warrant 20K. Name your budget presets (e.g., quick, standard, deep) and document what each is appropriate for.
Step 1: Project Setup
Create the project structure:
mkdir codebase-context-pipeline
cd codebase-context-pipeline
python3 -m venv .venv
source .venv/bin/activate
pip install \
anthropic \
openai \
tiktoken \
chromadb \
langchain \
langchain-openai \
langchain-chroma \
llama-index-core \
llama-index-embeddings-openai \
click \
rich \
pydantic
mkdir -p src/{layers,utils,config}
touch src/__init__.py
touch src/layers/__init__.py
touch src/utils/__init__.py
touch src/config/__init__.py
touch src/pipeline.py
touch src/cli.py
Project configuration:
from pydantic import BaseSettings
from pathlib import Path
class Settings(BaseSettings):
# Model settings
openai_api_key: str = ""
anthropic_api_key: str = ""
embedding_model: str = "text-embedding-3-large"
embedding_dimensions: int = 1024
llm_model: str = "claude-sonnet-4-5"
# Token budgets
budget_quick: int = 3000
budget_standard: int = 8000
budget_deep: int = 20000
# Layer token allocations (as fractions of total budget)
fraction_static: float = 0.05 # 5% — architecture overview
fraction_structural: float = 0.15 # 15% — repo map
fraction_rag: float = 0.35 # 35% — semantic retrieval
fraction_deps: float = 0.20 # 20% — dependency graph
fraction_target: float = 0.25 # 25% — target file content
# Index settings
index_dir: str = ".ctx-index"
ignore_patterns: list[str] = [
"node_modules", ".git", "dist", "build", ".next",
"coverage", "__pycache__", "*.lock", "*.min.*",
"*.generated.*", "migrations"
]
class Config:
env_file = ".env"
settings = Settings()
Tip: Store your pipeline's configuration in a .ctx.yaml or .env file at the project root and add it to .gitignore. Team-wide settings (budget presets, model choice) belong in a version-controlled ctx.config.yaml at the repo root. Personal overrides (API keys, personal budget preferences) belong in the gitignored local config.
Step 2: The Static Layer
The static layer always contributes CLAUDE.md and any pinned architecture documents. It is cheap (fixed, small) and provides the persistent foundation:
from pathlib import Path
from src.config.settings import settings
from src.utils.tokens import count_tokens, truncate_to_budget
STATIC_FILES = [
"CLAUDE.md",
"docs/architecture.md",
"docs/conventions.md",
".cursorrules", # fallback if CLAUDE.md not present
]
def build_static_layer(repo_root: str, token_budget: int) -> tuple[str, int]:
"""
Build the static context layer from project context files.
Returns:
(content, tokens_used)
"""
root = Path(repo_root)
sections = []
tokens_used = 0
for filename in STATIC_FILES:
filepath = root / filename
if not filepath.exists():
continue
content = filepath.read_text()
content_tokens = count_tokens(content)
if tokens_used + content_tokens <= token_budget:
sections.append(f"## Project Context ({filename})\n{content}")
tokens_used += content_tokens
else:
# Truncate to fit within budget
remaining = token_budget - tokens_used
if remaining > 100:
truncated = truncate_to_budget(content, remaining)
sections.append(f"## Project Context ({filename}) [truncated]\n{truncated}")
tokens_used += remaining
break # no more budget
if not sections:
# Minimal fallback if no context files exist
return "## Project Context\nNo CLAUDE.md or context files found.\n", 20
return "\n\n".join(sections), tokens_used
Step 3: The Structural Layer
The structural layer provides a filtered repo map focused on modules relevant to the current query:
import subprocess
from pathlib import Path
from src.utils.tokens import count_tokens, truncate_to_budget
def get_repo_map_from_aider(repo_root: str) -> str:
"""Generate repo map using aider's built-in map generator."""
try:
result = subprocess.run(
["aider", "--show-repo-map", "--no-git"],
capture_output=True, text=True, cwd=repo_root, timeout=30
)
return result.stdout
except (subprocess.SubprocessError, FileNotFoundError):
return get_repo_map_from_ctags(repo_root)
def get_repo_map_from_ctags(repo_root: str) -> str:
"""Fallback: generate basic repo map using ctags."""
try:
result = subprocess.run(
["ctags", "-R", "--fields=+n", "--output-format=json", "-o", "-", "."],
capture_output=True, text=True, cwd=repo_root, timeout=30
)
import json
from collections import defaultdict
by_file = defaultdict(list)
for line in result.stdout.strip().split('\n'):
try:
tag = json.loads(line)
if tag.get('kind') in ('function', 'class', 'method', 'interface'):
by_file[tag['path']].append(f" {tag['kind']}: {tag['name']}")
except json.JSONDecodeError:
continue
lines = []
for path, symbols in sorted(by_file.items()):
lines.append(f"\n{path}")
lines.extend(symbols[:15])
return '\n'.join(lines)
except (subprocess.SubprocessError, FileNotFoundError):
return get_repo_map_from_filesystem(repo_root)
def get_repo_map_from_filesystem(repo_root: str) -> str:
"""Last resort: generate a simple directory/file tree."""
root = Path(repo_root)
lines = ["Repository Structure:"]
for path in sorted(root.rglob('*.{ts,py,java,go}')):
skip = any(p in str(path) for p in ['node_modules', '.git', 'dist', '__pycache__'])
if not skip:
lines.append(f" {path.relative_to(root)}")
return '\n'.join(lines[:200]) # cap at 200 lines
def filter_map_by_relevance(repo_map: str, query: str) -> str:
"""
Filter a repo map to show only files/symbols relevant to the query.
Uses simple keyword matching — replace with embedding similarity for better results.
"""
query_terms = set(query.lower().split())
# Remove common stop words that add noise
stop_words = {'the', 'a', 'an', 'is', 'in', 'to', 'of', 'and', 'or', 'how', 'what', 'why'}
query_terms -= stop_words
lines = repo_map.split('\n')
relevant_lines = []
current_file_relevant = False
for line in lines:
line_lower = line.lower()
is_file_header = not line.startswith(' ') and line.strip()
if is_file_header:
# Check if this file is relevant to query
current_file_relevant = any(term in line_lower for term in query_terms)
if current_file_relevant:
relevant_lines.append(line)
elif current_file_relevant:
relevant_lines.append(line)
# If filtering was too aggressive, return original map truncated
if len(relevant_lines) < 10:
return '\n'.join(lines[:100])
return '\n'.join(relevant_lines)
def build_structural_layer(repo_root: str, query: str, token_budget: int) -> tuple[str, int]:
"""Build the structural context layer."""
repo_map = get_repo_map_from_aider(repo_root)
filtered_map = filter_map_by_relevance(repo_map, query)
content = f"## Codebase Structure (relevant to query)\n```\n{filtered_map}\n```"
content = truncate_to_budget(content, token_budget)
return content, count_tokens(content)
Step 4: The RAG Layer
The RAG layer retrieves semantically relevant code chunks using the index we build from the codebase:
from pathlib import Path
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter, Language
from src.config.settings import settings
from src.utils.tokens import count_tokens
EXTENSION_TO_LANGUAGE = {
'.py': Language.PYTHON,
'.ts': Language.TS,
'.tsx': Language.TS,
'.js': Language.JS,
'.jsx': Language.JS,
'.java': Language.JAVA,
'.go': Language.GO,
}
def build_or_load_index(repo_root: str, force_rebuild: bool = False) -> Chroma:
"""Build the vector index or load it from disk if it exists."""
embeddings = OpenAIEmbeddings(
model=settings.embedding_model,
dimensions=settings.embedding_dimensions
)
index_path = str(Path(repo_root) / settings.index_dir)
if not force_rebuild:
try:
db = Chroma(persist_directory=index_path, embedding_function=embeddings)
count = db._collection.count()
if count > 0:
return db
except Exception:
pass
print("Building codebase index (first run)...")
all_docs = []
for filepath in Path(repo_root).rglob('*'):
if filepath.suffix not in EXTENSION_TO_LANGUAGE:
continue
if any(skip in str(filepath) for skip in settings.ignore_patterns):
continue
try:
content = filepath.read_text(encoding='utf-8', errors='ignore')
if len(content) < 50: # skip nearly empty files
continue
language = EXTENSION_TO_LANGUAGE[filepath.suffix]
splitter = RecursiveCharacterTextSplitter.from_language(
language=language,
chunk_size=1200,
chunk_overlap=80
)
docs = splitter.create_documents(
texts=[content],
metadatas=[{"source": str(filepath.relative_to(repo_root))}]
)
all_docs.extend(docs)
except Exception as e:
pass
print(f"Indexing {len(all_docs)} chunks...")
db = Chroma.from_documents(
documents=all_docs,
embedding=embeddings,
persist_directory=index_path
)
print(f"Index ready: {len(all_docs)} chunks.")
return db
def build_rag_layer(
query: str,
target_file: str | None,
repo_root: str,
token_budget: int
) -> tuple[str, int]:
"""
Retrieve semantically relevant code chunks for the query.
Excludes chunks from the target file (it will be included in the target layer).
"""
db = build_or_load_index(repo_root)
# Calculate how many chunks we can fit in the budget
avg_chunk_tokens = 350 # typical chunk size
max_chunks = max(2, token_budget // avg_chunk_tokens)
# Retrieve using MMR for diversity
retriever = db.as_retriever(
search_type="mmr",
search_kwargs={"k": max_chunks, "fetch_k": max_chunks * 4}
)
results = retriever.invoke(query)
# Filter out chunks from the target file (already included elsewhere)
if target_file:
target_rel = str(Path(target_file).relative_to(repo_root))
results = [r for r in results if r.metadata.get('source') != target_rel]
sections = ["## Semantically Relevant Code\n"]
tokens_used = 20
for doc in results:
chunk_section = f"\n### {doc.metadata.get('source', 'unknown')}\n```\n{doc.page_content}\n```\n"
chunk_tokens = count_tokens(chunk_section)
if tokens_used + chunk_tokens <= token_budget:
sections.append(chunk_section)
tokens_used += chunk_tokens
else:
break
return ''.join(sections), tokens_used
Step 5: The Dependency and Target Layers
from pathlib import Path
import re
from src.utils.tokens import count_tokens, truncate_to_budget
def extract_imports(file_path: str, repo_root: str) -> list[str]:
"""Extract relative imports from a file and resolve them to actual paths."""
content = Path(file_path).read_text()
suffix = Path(file_path).suffix
resolved = []
if suffix in ('.ts', '.tsx', '.js', '.jsx'):
pattern = re.compile(r"from\s+['\"](\./[^'\"]+|@/[^'\"]+)['\"]")
elif suffix == '.py':
pattern = re.compile(r"from\s+\.(\w+)\s+import|import\s+\.(\w+)")
else:
return []
for match in pattern.finditer(content):
import_path = (match.group(1) or match.group(2) or '').strip()
if import_path.startswith('./') or import_path.startswith('../'):
base = Path(file_path).parent
candidate = (base / import_path).resolve()
for ext in ['', '.ts', '.tsx', '.js', '/index.ts', '/index.js', '.py']:
full = Path(str(candidate) + ext)
if full.exists() and full.is_file():
try:
resolved.append(str(full.relative_to(repo_root)))
except ValueError:
pass
break
return resolved
def build_dependency_layer(
target_file: str,
repo_root: str,
token_budget: int
) -> tuple[str, int]:
"""Include direct imports of the target file as signature-only context."""
if not target_file or not Path(target_file).exists():
return "", 0
imports = extract_imports(target_file, repo_root)
sections = ["## Direct Dependencies (signatures)\n"]
tokens_used = 30
for import_path in imports[:8]: # cap at 8 direct imports
full_path = Path(repo_root) / import_path
if not full_path.exists():
continue
content = full_path.read_text()
# Include only the first 40 lines (signatures and class definitions)
sig_lines = content.split('\n')[:40]
sig_content = '\n'.join(sig_lines)
section = f"\n### {import_path} (signatures)\n```\n{sig_content}\n```\n"
section_tokens = count_tokens(section)
if tokens_used + section_tokens <= token_budget:
sections.append(section)
tokens_used += section_tokens
return ''.join(sections), tokens_used
def build_target_layer(
target_file: str | None,
repo_root: str,
token_budget: int
) -> tuple[str, int]:
"""Include the full content of the target file."""
if not target_file:
return "", 0
path = Path(target_file)
if not path.exists():
return f"## Target File\nFile not found: {target_file}\n", 50
content = path.read_text()
file_tokens = count_tokens(content)
if file_tokens <= token_budget:
section = f"## Target File: {path.name}\n```\n{content}\n```\n"
return section, file_tokens
else:
# Truncate intelligently: keep the beginning and end
half_budget = token_budget // 2
lines = content.split('\n')
# Keep imports + first third
head_lines = lines[:len(lines) // 3]
tail_lines = lines[-20:] # always keep the last 20 lines
truncated = '\n'.join(head_lines) + '\n\n... [truncated for token budget] ...\n\n' + '\n'.join(tail_lines)
section = f"## Target File: {path.name} [truncated]\n```\n{truncated}\n```\n"
return section, count_tokens(section)
Step 6: Token Utilities
import tiktoken
_encoder = None
def get_encoder():
global _encoder
if _encoder is None:
_encoder = tiktoken.get_encoding("cl100k_base") # works for GPT-4 and Claude
return _encoder
def count_tokens(text: str) -> int:
return len(get_encoder().encode(text))
def truncate_to_budget(text: str, token_budget: int) -> str:
"""Truncate text to fit within a token budget."""
encoder = get_encoder()
tokens = encoder.encode(text)
if len(tokens) <= token_budget:
return text
truncated_tokens = tokens[:token_budget - 5] # -5 for safety margin
return encoder.decode(truncated_tokens) + "\n... [truncated]"
def tokens_to_cost_usd(input_tokens: int, output_tokens: int, model: str = "gpt-4o") -> float:
"""Estimate API cost in USD."""
pricing = {
"gpt-4o": {"input": 2.50, "output": 10.00},
"gpt-4o-mini": {"input": 0.15, "output": 0.60},
"claude-sonnet-4-5": {"input": 3.00, "output": 15.00},
"claude-haiku-3-5": {"input": 0.80, "output": 4.00},
}
rates = pricing.get(model, {"input": 3.00, "output": 15.00})
return (input_tokens / 1_000_000 * rates["input"]) + (output_tokens / 1_000_000 * rates["output"])
Step 7: The Pipeline Orchestrator
from pathlib import Path
from rich.console import Console
from rich.table import Table
from src.config.settings import settings
from src.layers.static_layer import build_static_layer
from src.layers.structural_layer import build_structural_layer
from src.layers.rag_layer import build_rag_layer, build_or_load_index
from src.layers.dependency_layer import build_dependency_layer
from src.layers.target_layer import build_target_layer
from src.utils.tokens import count_tokens, tokens_to_cost_usd
console = Console()
BUDGET_PRESETS = {
"quick": settings.budget_quick,
"standard": settings.budget_standard,
"deep": settings.budget_deep,
}
def build_context(
query: str,
repo_root: str,
target_file: str | None = None,
budget_preset: str = "standard",
output_format: str = "markdown"
) -> dict:
"""
Build an optimized context for the given query and target file.
Returns:
dict with 'context' (str), 'token_counts' (dict), 'total_tokens' (int)
"""
total_budget = BUDGET_PRESETS.get(budget_preset, settings.budget_standard)
# Allocate budgets per layer
budgets = {
"static": int(total_budget * settings.fraction_static),
"structural": int(total_budget * settings.fraction_structural),
"rag": int(total_budget * settings.fraction_rag),
"deps": int(total_budget * settings.fraction_deps),
"target": int(total_budget * settings.fraction_target),
}
console.print(f"\n[bold cyan]Building context[/bold cyan] (budget: {total_budget} tokens, preset: {budget_preset})")
layers = []
token_counts = {}
# Layer 1: Static context
with console.status("Loading static context (CLAUDE.md)..."):
static_content, static_tokens = build_static_layer(repo_root, budgets["static"])
layers.append(static_content)
token_counts["static"] = static_tokens
# Layer 2: Structural context (repo map)
with console.status("Building structural context (repo map)..."):
structural_content, structural_tokens = build_structural_layer(
repo_root, query, budgets["structural"]
)
layers.append(structural_content)
token_counts["structural"] = structural_tokens
# Layer 3: RAG retrieval
with console.status("Retrieving semantically relevant code..."):
rag_content, rag_tokens = build_rag_layer(
query, target_file, repo_root, budgets["rag"]
)
layers.append(rag_content)
token_counts["rag"] = rag_tokens
# Layer 4: Dependencies
if target_file:
with console.status("Tracing dependencies..."):
dep_content, dep_tokens = build_dependency_layer(
target_file, repo_root, budgets["deps"]
)
layers.append(dep_content)
token_counts["deps"] = dep_tokens
else:
token_counts["deps"] = 0
# Layer 5: Target file
if target_file:
with console.status("Loading target file..."):
target_content, target_tokens = build_target_layer(
target_file, repo_root, budgets["target"]
)
layers.append(target_content)
token_counts["target"] = target_tokens
else:
token_counts["target"] = 0
# Assemble final context
header = f"# AI Context — {query[:80]}...\n\n" if len(query) > 80 else f"# AI Context — {query}\n\n"
context = header + "\n\n---\n\n".join(filter(None, layers))
total_tokens = count_tokens(context)
estimated_cost = tokens_to_cost_usd(total_tokens, 1000) # assume ~1K output tokens
# Display summary table
table = Table(title="Context Summary", show_header=True)
table.add_column("Layer", style="cyan")
table.add_column("Tokens", justify="right")
table.add_column("Budget", justify="right")
table.add_column("Usage", justify="right")
for layer, tokens in token_counts.items():
budget = budgets.get(layer, 0)
usage = f"{tokens/budget*100:.0f}%" if budget > 0 else "N/A"
table.add_row(layer, str(tokens), str(budget), usage)
table.add_row("[bold]TOTAL[/bold]", f"[bold]{total_tokens}[/bold]",
f"[bold]{total_budget}[/bold]",
f"[bold]{total_tokens/total_budget*100:.0f}%[/bold]")
console.print(table)
console.print(f"Estimated cost per query: [green]${estimated_cost:.4f}[/green]")
return {
"context": context,
"token_counts": token_counts,
"total_tokens": total_tokens,
"estimated_cost_usd": estimated_cost
}
Step 8: The CLI Interface
import click
import subprocess
import sys
from pathlib import Path
from rich.console import Console
from src.pipeline import build_context, BUDGET_PRESETS
from src.layers.rag_layer import build_or_load_index
console = Console()
@click.group()
def cli():
"""ctx — Codebase Context Pipeline for AI-assisted development."""
pass
@cli.command()
@click.argument('query')
@click.option('--file', '-f', default=None, help='Target file you are working on')
@click.option('--repo', '-r', default='.', help='Repository root (default: current directory)')
@click.option('--budget', '-b', default='standard',
type=click.Choice(['quick', 'standard', 'deep']),
help='Token budget preset')
@click.option('--output', '-o', default=None, help='Write context to file instead of stdout')
@click.option('--copy', '-c', is_flag=True, help='Copy to clipboard (macOS/Linux)')
def get(query, file, repo, budget, output, copy):
"""Get optimized context for a query.
Examples:
\b
# Engineer: refactoring a specific file
ctx get "refactor UserService to support soft deletes" -f src/services/UserService.ts
\b
# QA: writing tests for a feature
ctx get "write integration tests for the checkout payment flow" -b deep
\b
# PM: understanding a feature
ctx get "how does order fulfillment work end to end" -b quick
"""
repo_root = str(Path(repo).resolve())
target_file = str(Path(file).resolve()) if file else None
result = build_context(
query=query,
repo_root=repo_root,
target_file=target_file,
budget_preset=budget
)
context = result["context"]
if output:
Path(output).write_text(context)
console.print(f"\n[green]Context written to {output}[/green]")
elif copy:
try:
subprocess.run(['pbcopy'], input=context.encode(), check=True) # macOS
console.print(f"\n[green]Context copied to clipboard ({result['total_tokens']} tokens)[/green]")
except FileNotFoundError:
try:
subprocess.run(['xclip', '-selection', 'clipboard'],
input=context.encode(), check=True) # Linux
console.print(f"\n[green]Context copied to clipboard[/green]")
except FileNotFoundError:
console.print("[yellow]Clipboard copy not available — printing to stdout[/yellow]")
click.echo(context)
else:
click.echo(context)
@cli.command()
@click.option('--repo', '-r', default='.', help='Repository root')
@click.option('--force', is_flag=True, help='Force rebuild even if index exists')
def index(repo, force):
"""Build or rebuild the codebase vector index."""
repo_root = str(Path(repo).resolve())
console.print(f"Building index for: {repo_root}")
build_or_load_index(repo_root, force_rebuild=force)
console.print("[green]Index ready.[/green]")
@cli.command()
@click.option('--repo', '-r', default='.', help='Repository root')
def benchmark(repo):
"""Compare token usage: naive inclusion vs. optimized pipeline."""
from src.utils.tokens import count_tokens
repo_root = str(Path(repo).resolve())
# Count naive inclusion (all source files)
naive_tokens = 0
file_count = 0
for path in Path(repo_root).rglob('*.{ts,py,js,java}'):
if any(skip in str(path) for skip in ['node_modules', '.git', 'dist']):
continue
try:
naive_tokens += count_tokens(path.read_text())
file_count += 1
except Exception:
pass
# Run optimized pipeline
result = build_context(
query="refactor the main service class",
repo_root=repo_root,
budget_preset="standard"
)
optimized_tokens = result["total_tokens"]
reduction = (1 - optimized_tokens / naive_tokens) * 100 if naive_tokens > 0 else 0
console.print(f"\n[bold]Benchmark Results[/bold]")
console.print(f"Naive (all {file_count} source files): {naive_tokens:,} tokens")
console.print(f"Optimized pipeline: {optimized_tokens:,} tokens")
console.print(f"Reduction: [green]{reduction:.1f}%[/green]")
console.print(f"Cost savings per query (approx): [green]${(naive_tokens - optimized_tokens) / 1_000_000 * 3:.4f}[/green]")
def main():
cli()
if __name__ == '__main__':
main()
Step 9: Running the Pipeline and Validating Results
Install and test the tool:
pip install -e .
export PATH="$PATH:$(pwd)"
ctx index --repo /path/to/your/project
ctx get "add input validation to the user registration endpoint" \
--file src/controllers/users.ts \
--budget standard \
--copy
ctx get "what are the edge cases in payment processing I should test" \
--budget deep \
--output /tmp/qa-context.md
ctx get "explain how the discount code system works" \
--budget quick
ctx benchmark --repo /path/to/your/project
Expected benchmark output for a medium-sized project:
Building context (budget: 8000 tokens, preset: standard)
╭─────────────────────────────────────────────────────╮
│ Context Summary │
├────────────┬────────┬────────┬────────────────────────┤
│ Layer │ Tokens │ Budget │ Usage │
├────────────┼────────┼────────┼────────────────────────┤
│ static │ 487 │ 400 │ 122% (CLAUDE.md is big)│
│ structural │ 1102 │ 1200 │ 92% │
│ rag │ 2687 │ 2800 │ 96% │
│ deps │ 1554 │ 1600 │ 97% │
│ target │ 1892 │ 2000 │ 95% │
│ TOTAL │ 7722 │ 8000 │ 97% │
╰────────────┴────────┴────────┴────────────────────────╯
Estimated cost per query: $0.0232
Benchmark Results
Naive (all 312 source files): 187,450 tokens
Optimized pipeline: 7,722 tokens
Reduction: 95.9%
Cost savings per query (approx): $0.5389
Tip: Run the benchmark at sprint boundaries to track how your optimization improves as you tune the pipeline. The first run establishes your baseline. Over several iterations of tuning (adjusting layer fractions, improving the RAG index, refining CLAUDE.md), you will see both the token reduction percentage and the response quality improve simultaneously — confirming that the quality-cost relationship is genuinely virtuous, not a trade-off.
Step 10: Integrating with Your Team's AI Workflow
With the tool working, integrate it into everyday workflows:
Git hook integration:
#!/bin/bash
CHANGED_FILES=$(git diff --cached --name-only | head -5 | tr '\n' ' ')
if [ -n "$CHANGED_FILES" ]; then
echo "# Changed files: $CHANGED_FILES" >> "$1"
echo "# Run: ctx get 'describe these changes' --copy" >> "$1"
fi
VS Code task integration:
// .vscode/tasks.json
{
"version": "2.0.0",
"tasks": [
{
"label": "Get AI Context (current file)",
"type": "shell",
"command": "ctx get '${input:query}' --file ${file} --copy",
"presentation": { "reveal": "always", "panel": "shared" }
}
],
"inputs": [
{
"id": "query",
"description": "What do you want to do?",
"type": "promptString"
}
]
}
Shell alias for quick access:
alias ai-ctx='ctx get'
alias ai-idx='ctx index'
function ai-edit() {
local query="${1:-implement the TODO comments}"
local file="${2:-$(git diff --name-only HEAD | head -1)}"
ctx get "$query" --file "$file" --copy
echo "Context for '$file' copied — paste into your AI tool."
}
Tip: The most successful adoption pattern for this tool is starting with one high-frequency pain point — usually the query type that most often produces wrong or incomplete AI responses. Get that one query type working excellently with the pipeline, then let word of mouth within the team drive broader adoption. Trying to roll out the tool for all use cases simultaneously leads to inconsistent experiences and slower adoption.