The Token Cost of Running Too Long
One of the most preventable sources of token waste in agentic systems is continuation past the point of value. An agent that has effectively completed its task but keeps running — verifying, re-verifying, hedging, exploring tangential possibilities — burns tokens without producing better outcomes.
This problem has two faces: under-specification (the agent has no clear exit criteria, so it continues indefinitely) and over-caution (the agent has exit criteria but keeps running "just to make sure"). Both are architectural failures, and both have systematic solutions.
Early termination is not about cutting agents off prematurely. It is about defining, upfront, the conditions under which a task is done — and enforcing those conditions programmatically so the agent does not have to decide for itself when to stop.
This topic addresses: how to define exit conditions, how to implement them in code across major frameworks, and how to handle the edge cases that make early termination tricky in practice.
Taxonomy of Exit Conditions
Exit conditions fall into four categories, each serving a different purpose:
1. Success Conditions
The agent has achieved the specified goal. This should be the most common exit path.
Examples:
- All files in the target list have been processed
- The test suite passes with >= 90% coverage
- A valid JSON artifact matching the output schema has been produced
- The generated code compiles and all existing tests still pass
2. Convergence Conditions
The agent is making diminishing returns — each new iteration produces less new information or smaller changes than the previous.
Examples:
- Consecutive iterations produce outputs that differ by less than N%
- The agent has proposed and rejected the same approach twice
- The last 3 tool calls returned no new information
3. Budget Conditions
Hard limits on token consumption, iteration count, or time. These are safety nets that prevent runaway agents.
Examples:
- Maximum N iterations reached
- Context window at 80% capacity
- Total tokens consumed > X
- Wall clock time > Y minutes
4. Failure Conditions
The agent has encountered an unrecoverable state. Continuing would not help; escalation is needed.
Examples:
- Tool has returned the same error 3 consecutive times
- Required input resource is unavailable
- Agent output cannot be parsed for N consecutive iterations
- Agent is generating content that contradicts its own prior output (diverging)
Tip: For every agentic task you design, write out exit conditions in all four categories before writing the system prompt or tool definitions. This is the equivalent of defining acceptance criteria before writing code — it clarifies what "done" means, prevents gold-plating, and gives you the vocabulary to implement programmatic stop signals.
Implementing Exit Conditions: From Concept to Code
Method 1: Stop Signal in Agent Output
The simplest approach is instructing the agent to include a structured stop signal in its output when it believes the task is complete. The orchestrator checks for this signal after each iteration.
SYSTEM_PROMPT = """
You are a code review agent. When you have reviewed all files and produced
your complete findings report, end your response with exactly:
REVIEW_COMPLETE: {"status": "success", "files_reviewed": N, "issues_found": M}
Do not continue after producing REVIEW_COMPLETE. Do not add caveats or
offer to do additional work after this signal.
"""
def run_agent_loop(initial_message: str, max_iterations: int = 20) -> dict:
messages = [SystemMessage(content=SYSTEM_PROMPT)]
messages.append(HumanMessage(content=initial_message))
for iteration in range(max_iterations):
response = llm.invoke(messages, tools=TOOLS)
messages.append(response)
# Check for success exit condition
if "REVIEW_COMPLETE:" in response.content:
signal = parse_stop_signal(response.content, "REVIEW_COMPLETE")
return {"status": "success", "signal": signal, "iterations": iteration + 1}
# Check for budget exit condition
total_tokens = count_tokens_in_messages(messages)
if total_tokens > TOKEN_BUDGET * 0.85:
return {"status": "budget_exceeded", "iterations": iteration + 1,
"partial_result": extract_partial_result(messages)}
# Process tool calls if present
if response.tool_calls:
tool_results = execute_tool_calls(response.tool_calls)
messages.extend(tool_results)
else:
# No tool calls and no stop signal — agent appears stuck
return {"status": "stalled", "iterations": iteration + 1}
return {"status": "max_iterations_reached", "iterations": max_iterations}
Method 2: Convergence Detection
Track the "delta" between consecutive iterations to detect when an agent is no longer making progress.
import difflib
from collections import deque
class ConvergenceDetector:
def __init__(self, window_size: int = 3, threshold: float = 0.05):
self.window = deque(maxlen=window_size)
self.threshold = threshold # 5% change threshold
def record(self, iteration_output: str) -> None:
self.window.append(iteration_output)
def is_converged(self) -> bool:
if len(self.window) < 2:
return False
# Compare last two outputs
a, b = self.window[-2], self.window[-1]
matcher = difflib.SequenceMatcher(None, a, b)
similarity = matcher.ratio()
change_ratio = 1.0 - similarity
return change_ratio < self.threshold
def is_oscillating(self) -> bool:
"""Detect if the agent is cycling between the same outputs."""
if len(self.window) < 3:
return False
# Check if oldest and newest outputs are more similar than consecutive outputs
oldest, _, newest = list(self.window)
matcher_oe = difflib.SequenceMatcher(None, oldest, newest)
return matcher_oe.ratio() > 0.85
detector = ConvergenceDetector(window_size=3, threshold=0.05)
for iteration in range(max_iterations):
response = llm.invoke(messages, tools=TOOLS)
detector.record(response.content)
if detector.is_converged():
print(f"Agent converged at iteration {iteration + 1} — exiting")
break
if detector.is_oscillating():
print(f"Agent oscillating — breaking cycle at iteration {iteration + 1}")
break
Method 3: Tool-Call-Based Exit Triggers
For agents that primarily operate through tool calls, define a dedicated "finish" tool that the agent must call to signal completion. This is cleaner than parsing text output for stop signals.
FINISH_TOOL = {
"name": "task_complete",
"description": "Call this tool when you have fully completed the assigned task. "
"You MUST call this tool to signal completion — do not just stop responding.",
"parameters": {
"type": "object",
"properties": {
"summary": {
"type": "string",
"description": "One-paragraph summary of what was accomplished"
},
"artifacts": {
"type": "array",
"items": {"type": "string"},
"description": "List of files created or modified"
},
"confidence": {
"type": "number",
"description": "Confidence that task is fully complete (0.0-1.0)"
}
},
"required": ["summary", "artifacts", "confidence"]
}
}
def handle_tool_call(tool_name: str, tool_args: dict) -> tuple[str, bool]:
"""Returns (result, should_terminate)."""
if tool_name == "task_complete":
confidence = tool_args.get("confidence", 0)
if confidence >= 0.8:
return "Task marked complete.", True # EXIT SIGNAL
else:
# Agent isn't confident — provide guidance and continue
return (
f"You reported low confidence ({confidence}). "
"Identify what specific uncertainty remains and address it.",
False
)
# Handle other tools normally
result = execute_tool(tool_name, tool_args)
return result, False
for iteration in range(max_iterations):
response = llm.invoke(messages, tools=[FINISH_TOOL, *DOMAIN_TOOLS])
should_terminate = False
for tool_call in response.tool_calls:
result, should_terminate = handle_tool_call(
tool_call.name, tool_call.args
)
if should_terminate:
break
if should_terminate:
break
Tip: Prefer tool-based exit signals over text-based stop signals for agents that use function calling. Text-based signals can fail if the model decides to include the stop phrase mid-response or paraphrase it. A structured tool call is unambiguous and parseable. The task_complete tool pattern also works well because it forces the model to explicitly commit to a confidence level — low confidence results are a useful signal for human review queues.
Framework-Specific Early Termination
LangGraph: Conditional Edges and State-Based Exits
LangGraph's conditional edges are the natural home for exit logic.
from langgraph.graph import StateGraph, END
from typing import TypedDict, Literal
class AgentState(TypedDict):
messages: list
iterations: int
token_count: int
exit_reason: str | None
def should_continue(state: AgentState) -> Literal["continue", "end"]:
"""Route to END or continue based on exit conditions."""
# Budget condition: 80% of context window
if state["token_count"] > MAX_TOKENS * 0.8:
state["exit_reason"] = "budget_limit"
return "end"
# Iteration budget
if state["iterations"] >= MAX_ITERATIONS:
state["exit_reason"] = "max_iterations"
return "end"
# Success condition: check last message for finish tool call
last_message = state["messages"][-1]
if hasattr(last_message, "tool_calls"):
for tc in last_message.tool_calls:
if tc["name"] == "task_complete":
state["exit_reason"] = "success"
return "end"
# Check for stall: last N messages contain no tool calls
recent_messages = state["messages"][-4:]
has_tool_activity = any(
hasattr(m, "tool_calls") and m.tool_calls
for m in recent_messages
)
if not has_tool_activity and state["iterations"] > 3:
state["exit_reason"] = "stalled"
return "end"
return "continue"
builder = StateGraph(AgentState)
builder.add_node("agent", agent_node)
builder.add_node("tools", tool_node)
builder.set_entry_point("agent")
builder.add_conditional_edges(
"agent",
should_continue,
{"continue": "tools", "end": END}
)
builder.add_edge("tools", "agent")
CrewAI: Max Iterations and Task Callbacks
from crewai import Agent, Task, Crew
from crewai.utilities import TaskOutputHandler
focused_agent = Agent(
role="Code Analyzer",
goal="Analyze the provided code and return findings",
backstory="Expert code reviewer",
max_iter=8, # Hard iteration cap
max_rpm=10, # Rate limit — also prevents runaway loops
verbose=False
)
def task_callback(output):
if "INSUFFICIENT_DATA" in output.raw_output:
raise ValueError("Agent cannot complete task with provided context")
# Log token usage for monitoring
log_token_usage(output.token_usage)
analysis_task = Task(
description="Analyze the code for issues. Call task_complete when done.",
agent=focused_agent,
expected_output="JSON findings array",
callback=task_callback
)
OpenAI Assistants API: Run Control
from openai import OpenAI
import time
client = OpenAI()
def run_with_exit_control(
assistant_id: str,
thread_id: str,
max_steps: int = 15
) -> dict:
run = client.beta.threads.runs.create(
thread_id=thread_id,
assistant_id=assistant_id,
max_prompt_tokens=50000, # Hard token budget
max_completion_tokens=4000,
truncation_strategy={
"type": "last_messages", # Keep recent context if truncation needed
"last_messages": 10
}
)
steps_taken = 0
while run.status in ("queued", "in_progress", "requires_action"):
time.sleep(0.5)
run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run.id)
steps_taken += 1
if steps_taken > max_steps:
# Cancel the run — don't let it continue burning tokens
client.beta.threads.runs.cancel(thread_id=thread_id, run_id=run.id)
return {"status": "cancelled", "reason": "max_steps"}
if run.status == "requires_action":
# Handle tool calls; check for task_complete tool
tool_outputs = []
for tc in run.required_action.submit_tool_outputs.tool_calls:
if tc.function.name == "task_complete":
return {"status": "success", "steps": steps_taken}
output = execute_tool(tc.function.name, tc.function.arguments)
tool_outputs.append({"tool_call_id": tc.id, "output": output})
run = client.beta.threads.runs.submit_tool_outputs(
thread_id=thread_id,
run_id=run.id,
tool_outputs=tool_outputs
)
return {"status": run.status, "steps": steps_taken}
Tip: In the OpenAI Assistants API, always set max_prompt_tokens and max_completion_tokens explicitly. Without these, a long-running run can silently consume very large amounts of tokens. The truncation_strategy parameter is equally important: set it to last_messages to ensure that if truncation is needed, the most recent (and typically most relevant) context is preserved rather than the oldest.
Handling Partial Results on Early Termination
When an agent is stopped by a budget or iteration limit (not a success condition), you need a strategy for handling the partial result. Discarding all work done so far wastes the tokens already spent.
def extract_partial_result(messages: list, exit_reason: str) -> dict:
"""Extract whatever useful output exists when an agent is terminated early."""
# Find the most recent structured output (tool call results, JSON blocks)
structured_outputs = []
for msg in reversed(messages):
if hasattr(msg, "content") and msg.content:
json_blocks = extract_json_blocks(msg.content)
if json_blocks:
structured_outputs.extend(json_blocks)
break # Take only the most recent
# Find what tasks were completed vs. pending
completed_tasks = [
msg for msg in messages
if is_task_completion_marker(msg)
]
return {
"exit_reason": exit_reason,
"partial_outputs": structured_outputs,
"tasks_completed": len(completed_tasks),
"resumption_hint": generate_resumption_context(messages)
}
def generate_resumption_context(messages: list) -> str:
"""Produce a compact summary for resuming the task in a new session."""
# This is a new LLM call to summarize what was done
# Worth ~500 tokens to save re-running completed work
summary_response = llm.invoke([
SystemMessage(content="Summarize what has been accomplished and what remains."),
*messages[-6:] # Last 6 messages capture recent state
])
return summary_response.content
The resumption_hint is particularly valuable: it is a compact context artifact (usually 200–500 tokens) that allows a new agent session to pick up where the terminated session left off, without re-doing work or re-reading already-processed data.
Tip: For QA engineers managing long-running test automation agents: always extract and store partial results before terminating an agent, even on failure. A test agent that was terminated after completing 80% of a test suite still produced 80% of valuable output. Structure your test agent to emit incremental structured results (individual test outcomes to a results store) rather than accumulating all results for a single final output. This way, early termination is never a total loss.
Proactive vs. Reactive Exit Condition Design
Reactive exit conditions detect when the agent has already gone off track: it has exceeded its budget, it is oscillating, it has stalled. These are safety nets.
Proactive exit conditions prevent the agent from going off track in the first place. They are built into the task specification:
Reactive (safety net):
"If iterations > 20, terminate"
Proactive (built into task):
"Your task is to analyze exactly 3 files: auth.py, session.py, and models.py.
You have all required information. Analyze each file in order, then call
task_complete. Do not read any additional files or request additional context."
Proactive design eliminates the most common cause of runaway agents: the agent discovers it "needs" more context, reads something new, which suggests more new things to read, and the loop expands. By pre-specifying the exact scope, you close this expansion path.
Tip: Write your task specifications with explicit scope boundaries: "You have been given exactly the information you need," "Do not search for additional resources," "If you encounter something outside your provided context, note it as a gap in your output rather than attempting to fill it." This language shifts the agent from an exploratory to an execution mindset, dramatically reducing unnecessary tool calls and their associated token costs.
Summary
Early termination is a first-class optimization concern, not an afterthought. The four categories of exit conditions — success, convergence, budget, and failure — cover all cases where an agent should stop running. Tool-based exit signals (like a task_complete tool) are more reliable than text-based signals. Framework-specific implementations exist for LangGraph (conditional edges), CrewAI (max_iter), and OpenAI Assistants (run-level token limits).
Partial result extraction ensures that tokens spent before an early termination are not wasted. Proactive scope specification in task prompts is the highest-leverage intervention — it prevents the most common cause of runaway iterations before they start.
Combined with task decomposition and context scoping, early termination gives you control over agentic loop depth and cost across all phases of the plan-execute-verify cycle.