Cost of Naive Code Inclusion | Token Optimization Masterclass

When developers first start using AI coding assistants, the instinct is understandable: if the AI needs context, give it everything. Dump the whole repo. Paste every file. Include all the imports, tests, configs, and generated artifacts. The logic seems sound — more context means better answers. In practice, this instinct is one of the most expensive and counterproductive patterns in AI-assisted development.

This topic dissects the real costs of naive code inclusion: not just the financial cost of tokens, but the subtler degradation of output quality that comes from overwhelming your AI agent with irrelevant information.

Understanding What "Naive Code Inclusion" Looks Like in Practice

Naive inclusion describes any strategy where code is fed to an AI model without deliberate selection — the developer pastes or pipes in files because they are available, not because they are relevant.

The most common manifestations across different tools:

In Cursor: Opening a large codebase and using @codebase or selecting all files in a folder without filtering. Every open tab bleeds into context.

In Claude Code / aider: Running aider --files src/**/*.ts from a large monorepo root, which might include thousands of files spanning unrelated services.

In GitHub Copilot: Working in a large IDE workspace where all open files contribute to the implicit context window. Having 15 unrelated files open dilutes the signal.

In custom LLM pipelines: Feeding a raw git diff HEAD~10 or an entire package.json plus all source files into every prompt, regardless of what the prompt is actually asking.

In chatbot workflows: Pasting entire 500-line files into Claude.ai or ChatGPT to "give full context" when answering a single function-level question.

A concrete example of what naive inclusion looks like in an aider session:

aider --files $(find ./src -name "*.ts" | head -200)

Tip: Before adding any file to an AI session, ask yourself: "If I were explaining this problem to a colleague over Slack, would I send them this file?" If the answer is no, the AI probably does not need it either.

The Token Math: What Naive Inclusion Actually Costs

Tokens are the unit of currency for AI agents. Understanding the math makes the problem concrete.

Typical token densities for code:

File Type	Approximate Tokens per 100 Lines
TypeScript/JavaScript	400–600 tokens
Python	350–500 tokens
Java/C#	500–700 tokens
YAML/JSON config	200–400 tokens
Generated files (e.g., `package-lock.json`)	600–900 tokens
Test files (verbose assertions)	450–650 tokens

A medium-sized React application with 150 components (~100 lines each) represents roughly 75,000–90,000 tokens of TypeScript alone — before adding styles, configs, tests, or generated files.

Context window limits and their practical ceiling:

GPT-4o: 128K tokens
Claude Sonnet: 200K tokens
Gemini 1.5 Pro: 1M tokens (but cost scales accordingly)

Even with 200K token windows, dumping a medium codebase into context consumes 40–100% of available space, leaving minimal room for the actual conversation, error messages, proposed diffs, and response.

Cost by volume (approximate, mid-2024 pricing tiers):

A developer running 50 queries per day with naive 80K-token contexts incurs approximately:
- Input: 50 × 80,000 = 4,000,000 tokens/day
- At $3 per million input tokens: $12/day, $360/month

The same developer with targeted 5,000-token contexts:
- Input: 50 × 5,000 = 250,000 tokens/day
- Cost: ~$0.75/day, ~$22.50/month

That is a 16x cost reduction from context discipline alone, before any output optimization.

Tip: Track your actual token usage for one week using your provider's dashboard or a tool like LangSmith. Most developers are shocked to find that 60–80% of their token spend is redundant context, not the actual conversation.

How Irrelevant Context Degrades Model Quality

The financial cost is the visible problem. The hidden cost — degraded output quality — is often more damaging to productivity.

Large language models are statistical systems. When you provide a 90,000-token codebase and ask "fix the null pointer exception in UserService.java", the model must:

Process the entire context to find UserService.java
Weigh every other file as potential evidence about the codebase's conventions
Produce a response that is coherent with that entire context

The result is a phenomenon researchers call "lost in the middle" — a well-documented finding showing that LLMs perform significantly worse at utilizing information positioned in the middle of a long context compared to information near the beginning or end. A seminal paper from Stanford (Liu et al., 2023) demonstrated retrieval accuracy dropping from ~90% to ~50% when relevant information was buried in a 20-document context window.

In practical terms this means:

Convention drift: The model picks up coding patterns from unrelated modules (e.g., an older service using var instead of const) and applies them inconsistently to the new code it generates.

Hallucinated dependencies: When surrounded by many import statements, models sometimes generate code that imports modules visible in the context but irrelevant to the current function, or worse, modules that do not exist.

Reduced instruction adherence: Shorter, focused prompts produce more reliable instruction-following. A prompt with 80K tokens of code preamble is far more likely to violate a constraint ("do not use lodash") stated early in the context than an identical prompt with 3K tokens of targeted context.

Slower feedback loops: If each query takes 15–30 seconds to process (typical for large context window calls), the developer receives responses much more slowly than with targeted 2–5 second responses on focused contexts.


test_cases = [
    {"context_size": "full_repo",   "tokens": 85000, "correct": 14, "total": 20},
    {"context_size": "module_only", "tokens": 12000, "correct": 18, "total": 20},
    {"context_size": "file_only",   "tokens": 2500,  "correct": 19, "total": 20},
]

for tc in test_cases:
    accuracy = tc["correct"] / tc["total"] * 100
    print(f"{tc['context_size']:15s} | {tc['tokens']:6d} tokens | {accuracy:.0f}% accuracy")

Tip: When you notice an AI giving you off-brand suggestions (using the wrong framework pattern, violating your team's style guide), the first diagnostic step is to reduce context, not to add more instructions. Irrelevant context is almost always the cause.

Common Anti-Patterns and Why They Feel Justified

Each naive inclusion pattern comes with a rationalization that feels compelling until you understand the cost:

Anti-pattern 1: "More context = fewer follow-ups"

The belief: if I dump everything in upfront, the AI will have all it needs and I will not have to answer clarifying questions.

The reality: AI agents do not ask clarifying questions when given massive context — they make assumptions. Those assumptions are drawn from statistically prominent patterns in the context, which may not represent the specific area you are working on.

Anti-pattern 2: "I don't know what's relevant, so I'll include everything"

The belief: uncertainty justifies completeness.

The reality: this is exactly the problem RAG, repo maps, and dependency analysis solve (covered in later topics). The solution to not knowing what is relevant is a better selection strategy, not exhaustive inclusion.

Anti-pattern 3: "It's a big context window, so I should use it"

The belief: paying for a 200K token window means I should fill it.

The reality: context windows are ceilings, not targets. Using 200K tokens when 8K would suffice costs 25x more and produces worse results. The window exists for genuinely large tasks (e.g., reviewing an entire PR diff, analyzing a migration), not for routine queries.

Anti-pattern 4: Including generated files

package-lock.json, yarn.lock, *.min.js, build outputs, and migration history files are among the most token-dense, least informative files in any repository. Including them is almost always counterproductive.


package-lock.json
yarn.lock
pnpm-lock.yaml
*.min.js
*.min.css
dist/
build/
.next/
coverage/
*.snap          # Jest snapshots
migrations/     # unless specifically debugging migrations
*.generated.ts  # GraphQL codegen output

Tip: Create a .aiderignore (for aider), configure Cursor's .cursorignore, or set context_files_exclude in your tool's config to permanently exclude generated, lock, and build files. This one-time setup prevents thousands of wasted tokens per day.

Measuring Your Current Waste: A Diagnostic Audit

Before optimizing, measure the baseline. This audit works for any AI coding tool:

Step 1: Log a session

Enable logging in your tool of choice:

aider --model gpt-4o --stats --input-history-file ~/.aider_history.txt

import tiktoken

def count_tokens(text: str, model: str = "gpt-4o") -> int:
    enc = tiktoken.encoding_for_model(model)
    return len(enc.encode(text))

def logged_llm_call(prompt: str, context: str, model: str) -> dict:
    full_input = context + "\n\n" + prompt
    input_tokens = count_tokens(full_input, model)

    response = call_llm(full_input, model)  # your actual LLM call
    output_tokens = count_tokens(response, model)

    print(f"Input tokens: {input_tokens} | Output tokens: {output_tokens}")
    return {"response": response, "input_tokens": input_tokens, "output_tokens": output_tokens}

Step 2: Categorize your context

For each logged query, manually categorize what portion of the input tokens were:
- Directly relevant (the file/function being modified)
- Indirectly relevant (imports, called functions)
- Marginally relevant (same module but unrelated functionality)
- Irrelevant (completely unrelated files, generated content, lock files)

Most developers find that 50–75% of their context tokens fall into the "marginally relevant" or "irrelevant" categories.

Step 3: Calculate the opportunity

total_daily_tokens = 4_000_000  # your measured baseline
relevant_fraction = 0.25         # only 25% is genuinely relevant

wasted_tokens = total_daily_tokens * (1 - relevant_fraction)

cost_per_million = 3.0  # dollars, adjust for your model
daily_waste = (wasted_tokens / 1_000_000) * cost_per_million
monthly_waste = daily_waste * 30
print(f"Estimated monthly waste: ${monthly_waste:.2f}")

Tip: Run this audit as a team exercise. When engineers and QA analysts see their own token waste visualized, adoption of context discipline improves dramatically. Framing it as "we could run 10x more AI queries for the same budget" is more motivating than framing it as "stop wasting money."

The Quality-Cost Relationship: Why Less is More

The final concept to internalize is that token optimization is not a trade-off between quality and cost — it is a virtuous cycle where less context typically produces better quality at lower cost.

The intuition: a well-selected 5K token context containing only the relevant service, its dependencies, and a minimal project conventions summary gives the model a clean, unambiguous signal. A 90K token context containing the entire repo gives the model a noisy, ambiguous signal full of contradictory patterns and irrelevant structures.

This means:
- Precision improves: The model generates code that fits the actual component it is working on
- Consistency improves: The model applies the right patterns from the right part of the codebase
- Speed improves: Smaller contexts mean faster API responses
- Cost drops: Directly proportional to context reduction
- Iteration velocity improves: Cheaper, faster, more accurate responses encourage more frequent AI use

The remaining topics in this module build the toolkit to achieve this: repo maps for structural awareness, dependency analysis for targeted selection, CLAUDE.md for persistent conventions, and RAG pipelines for dynamic relevance retrieval. Each is a layer of the answer to the question this topic posed: what do you include instead of everything?

Tip: Commit to a "context budget" for your most common AI tasks. For example: bug fixes get 3K tokens of context, feature additions get 8K tokens, architecture questions get 15K. Enforcing budgets forces deliberate selection and quickly builds the habit of precision over volume.