Why Decomposition Is a Token Strategy, Not Just an Architecture Pattern
Task decomposition is often taught as a software engineering principle — break complex problems into manageable units, improve modularity, enable parallelism. All of that is true. But in agentic AI systems, decomposition is also your most powerful token-reduction lever.
When an agent handles a monolithic task in a single long-running loop, the full context of the entire task must remain in memory throughout execution. Every intermediate result, every tool output, every planning step stays in context until the task is done. By contrast, when the same task is split into bounded subtasks, each subtask runs with only the context relevant to it, completes, and releases that context. The next subtask starts fresh with a much smaller footprint.
This topic teaches you how to decompose tasks specifically with token efficiency as a first-class concern alongside correctness and performance — for software engineers designing agentic pipelines, QA engineers defining test automation agents, and product managers specifying agent workflows.
The Token Cost of Monolithic vs. Decomposed Tasks
Consider a monolithic agent tasked with: "Review our entire codebase, identify all security vulnerabilities, write a report, and create GitHub issues for each finding."
Running this as a single agentic loop:
Monolithic approach:
Iteration 1: Read file 1 → context +2,000 tokens
Iteration 2: Read file 2 → context +1,800 tokens
Iteration 3: Read file 3 → context +2,200 tokens
...
Iteration 20: Full codebase in context → ~45,000 tokens per call
Iteration 21: Writing report with full codebase context: 45,000 + output
Total input tokens: ~500,000+
Now decompose the same task:
Decomposed approach:
Subtask A: "Scan file group 1 for security issues. Return JSON summary only."
→ Peak context: ~8,000 tokens, 5 iterations, ~25,000 total tokens
Subtask B: "Scan file group 2 for security issues. Return JSON summary only."
→ Peak context: ~8,000 tokens, 5 iterations, ~25,000 total tokens
Subtask C: "Given these JSON summaries [structured data, ~500 tokens],
write a security report."
→ Peak context: ~4,000 tokens, 2 iterations, ~8,000 total tokens
Subtask D: "Given these JSON summaries, create GitHub issues."
→ Peak context: ~5,000 tokens, 6 iterations, ~15,000 total tokens
Total: ~73,000 tokens (vs. 500,000+)
Reduction: ~85%
The decomposition wins because each subtask's context is bounded and independent. The key mechanism: subtask outputs are summarized or structured before being passed to the next subtask, preventing raw intermediate data from accumulating.
Tip: When estimating the token benefit of decomposition, use this rule of thumb: if a task requires reading N pieces of information and producing M outputs, a monolithic approach costs approximately O(N²) tokens (each piece re-read as context grows), while a properly decomposed approach costs O(N) — linear in the amount of information processed.
Decomposition Patterns: A Taxonomy
Different task types call for different decomposition strategies. Here are four patterns with their token implications:
Pattern 1: Sequential Pipeline (Map-then-Reduce)
Best for: Tasks where you process many items independently and then aggregate.
Input: [item₁, item₂, ..., itemₙ]
↓
Stage 1 (Map): Agent instance per item, each with minimal context
→ [result₁, result₂, ..., resultₙ] (structured/compressed)
↓
Stage 2 (Reduce): Single agent receives only the structured results
→ Final output
Example (LangGraph):
from langgraph.graph import StateGraph, END
from typing import TypedDict, List
class PipelineState(TypedDict):
items: List[str]
item_results: List[dict] # compressed summaries, not raw outputs
final_report: str
def map_node(state: PipelineState) -> PipelineState:
"""Process one item at a time with minimal context."""
results = []
for item in state["items"]:
# Each call is isolated — no shared context between items
response = llm.invoke([
SystemMessage(content=ITEM_PROCESSOR_SYSTEM_PROMPT), # small, focused
HumanMessage(content=f"Analyze this item and return JSON: {item}")
])
results.append(parse_json_result(response.content))
state["item_results"] = results
return state
def reduce_node(state: PipelineState) -> PipelineState:
"""Aggregate only the structured results, not raw items."""
import json
summary_input = json.dumps(state["item_results"]) # structured, compact
response = llm.invoke([
SystemMessage(content=AGGREGATOR_SYSTEM_PROMPT),
HumanMessage(content=f"Aggregate these results: {summary_input}")
])
state["final_report"] = response.content
return state
Pattern 2: Hierarchical Decomposition (Tree)
Best for: Complex tasks where subtasks themselves may have subtasks.
Root Task
├── Subtask A
│ ├── Subtask A1 (minimal context: only what A1 needs)
│ └── Subtask A2 (minimal context: only what A2 needs)
│ [A summarizes A1+A2 results before returning to Root]
└── Subtask B
└── Subtask B1
[B summarizes B1 result before returning to Root]
[Root receives only summaries from A and B]
Token efficiency: each node in the tree operates with context proportional to its subtask, not the entire tree.
Pattern 3: Speculative Parallel Decomposition
Best for: Tasks where subtasks can be executed concurrently and their results merged.
import asyncio
async def parallel_subtasks(goal: str, subtask_specs: list[dict]) -> list[dict]:
async def run_subtask(spec):
return await isolated_agent_call(
system_prompt=spec["system_prompt"],
task=spec["task"],
context=spec["context"] # only the context needed for THIS subtask
)
results = await asyncio.gather(*[run_subtask(s) for s in subtask_specs])
return results
This pattern is especially effective for QA agents running multiple test scenarios, where each scenario is fully independent.
Pattern 4: Checkpoint-and-Handoff
Best for: Long multi-day or multi-session agentic workflows (common in product management agents running planning cycles).
Session 1: Gather requirements
→ Output: structured_requirements.json (compact artifact)
→ Context TERMINATED and released
Session 2: Design solution
→ Input: structured_requirements.json only
→ Output: design_doc.json (compact artifact)
→ Context TERMINATED and released
Session 3: Implement
→ Input: design_doc.json only
→ (no accumulated history from sessions 1 and 2)
Tip: Use the checkpoint-and-handoff pattern whenever a task spans more than approximately 15 agentic iterations. Produce a structured JSON artifact at each checkpoint rather than relying on conversational context. The JSON artifact becomes the "compressed memory" of prior work, and it is typically 10–50x smaller than the equivalent conversational history.
Defining Subtask Boundaries: The Context Minimization Principle
A subtask boundary is correctly placed when crossing it allows you to discard context. The question to ask for every proposed boundary: "Can I start the next phase knowing only the output of this phase — without carrying forward any of the raw data this phase processed?"
If the answer is no, the boundary is in the wrong place or the output artifact is under-specified.
Designing Output Artifacts for Handoff Efficiency
When a subtask's output becomes the next subtask's input, the output format determines the token cost of the handoff. Compare:
Verbose handoff (anti-pattern):
"I reviewed the authentication module. The main file is auth.py which is
847 lines long. I found several issues. The first issue is on line 234
where the password is being compared using == rather than a constant-time
comparison function like hmac.compare_digest. This is a timing attack
vulnerability. The second issue is on line 456 where the JWT secret is
being read from an environment variable named SECRET_KEY but there is no
validation that this variable is set, which means..."
[800 tokens of prose]
Structured handoff (optimized):
{
"module": "auth",
"issues": [
{"id": "SEC-001", "severity": "high", "type": "timing_attack",
"location": "auth.py:234", "fix": "use hmac.compare_digest"},
{"id": "SEC-002", "severity": "medium", "type": "missing_env_validation",
"location": "auth.py:456", "fix": "add SECRET_KEY presence check"}
]
}
[120 tokens of JSON]
The structured handoff carries the same actionable information at 15% of the token cost.
Tip: Define your subtask output schemas before you write the agent prompts. This forces you to think clearly about what information must be preserved vs. what can be discarded at each boundary. Use JSON Schema or Pydantic models for structured outputs — they also make agent outputs more reliable and parseable.
Practical: Decomposing a Software Engineering Agent Task
Here is a worked example of decomposing a complex agentic coding task for a senior software engineer persona.
Original monolithic prompt:
"Look at our entire Django codebase, understand the data models, identify N+1 query problems, fix them, write tests for the fixes, and update the documentation."
Step 1: Identify information domains
- Domain A: Data model understanding (schema files only)
- Domain B: Query pattern analysis (view files + ORM calls)
- Domain C: Fix generation (specific files with N+1 issues + schema context)
- Domain D: Test writing (fixed files only)
- Domain E: Documentation update (existing docs + fix summaries)
Step 2: Design minimal context for each domain
Subtask 1 — Schema extraction:
Input: models.py files only
Output: JSON schema summary (entity names, key relationships, ~200 tokens)
Releases: raw model files
Subtask 2 — N+1 detection:
Input: view files + schema summary from Subtask 1
Output: JSON list of {file, line, query_pattern, suggested_fix}
Releases: view files + schema summary
Subtask 3 — Fix implementation:
Input: For each finding: specific file content (one at a time) + fix spec
Output: Patched file content
Pattern: Map pattern (one file per agent call)
Subtask 4 — Test writing:
Input: Patched file content (one at a time) + testing conventions doc
Output: Test file content
Pattern: Map pattern
Subtask 5 — Documentation update:
Input: Structured summary of all fixes (from Subtask 2 findings)
Output: Updated doc sections
Step 3: Token budget comparison
Monolithic (estimated):
20 model files × 200 tokens = 4,000
50 view files × 400 tokens = 20,000
Accumulated across 30 iterations of compounding context
Estimated total: 350,000+ tokens
Decomposed (estimated):
Subtask 1: 5 iterations × 5,000 avg = 25,000 tokens
Subtask 2: 8 iterations × 8,000 avg = 64,000 tokens
Subtask 3: 8 files × 3 iterations × 4,000 avg = 96,000 tokens
Subtask 4: 8 files × 2 iterations × 3,000 avg = 48,000 tokens
Subtask 5: 3 iterations × 4,000 avg = 12,000 tokens
Total: 245,000 tokens (~30% reduction)
With structured handoffs (not passing raw files forward): ~120,000 tokens
Reduction vs. monolithic: ~65%
Tip: For QA engineers designing test automation agents: decompose by test layer (unit → integration → e2e) rather than by feature. Each layer has a different context requirement. Unit test agents need only the function under test and its direct dependencies. Integration test agents need module interfaces. E2E test agents need user flow descriptions — not source code. Respecting these boundaries produces much leaner agents than trying to build one agent that handles all test layers.
Decomposition Anti-Patterns to Avoid
Anti-Pattern 1: Shallow Decomposition
Splitting "Write a full application" into "Write the backend" and "Write the frontend" looks like decomposition but does nothing if the backend agent reads the entire codebase.
Anti-Pattern 2: Passing Full Context Forward
Subtask A produces a 3,000-token output. Subtask B receives that full output instead of a structured 200-token summary. The subtask boundary exists but provides no token benefit.
Anti-Pattern 3: Too-Fine Decomposition
Creating 50 subtasks for a task that could be handled in 5 adds orchestration overhead (each subtask requires a planning call, tool schema transmission, etc.). Find the granularity where each subtask is "just large enough" to produce a meaningful artifact.
Anti-Pattern 4: Context Leakage Through Shared Memory
In CrewAI, if all agents share a common memory store and each agent writes its full output to that store, every subsequent agent will read all prior outputs. The shared memory becomes an unbounded accumulator. Scope memory access: agents should read only what they need, not everything in the store.
Tip: Apply the "fresh start test" to each subtask boundary: if a new agent instance with zero prior context could execute the subtask given only the defined inputs, the boundary is correctly scoped. If the agent would "need to know" things not in the defined inputs, either the inputs are under-specified or the boundary is in the wrong place.
Summary
Task decomposition is simultaneously a software architecture pattern and a token optimization technique. The principles that make good decomposition are the same — minimal interfaces, bounded context, composable outputs — but the motivation includes explicit token cost reduction.
The four decomposition patterns (sequential pipeline, hierarchical, parallel, checkpoint-and-handoff) cover the majority of real-world agentic tasks. Applying structured output artifacts at subtask boundaries is the mechanism that converts architectural decomposition into actual token savings. Without compact handoff artifacts, decomposition provides structural clarity but limited token efficiency.
In the next topic, we extend these ideas to scoped agent sessions — where the context boundary is enforced at the session level, not just at the task decomposition level.