·

The Token Cost of Running Too Long

One of the most preventable sources of token waste in agentic systems is continuation past the point of value. An agent that has effectively completed its task but keeps running — verifying, re-verifying, hedging, exploring tangential possibilities — burns tokens without producing better outcomes.

This problem has two faces: under-specification (the agent has no clear exit criteria, so it continues indefinitely) and over-caution (the agent has exit criteria but keeps running "just to make sure"). Both are architectural failures, and both have systematic solutions.

Early termination is not about cutting agents off prematurely. It is about defining, upfront, the conditions under which a task is done — and enforcing those conditions programmatically so the agent does not have to decide for itself when to stop.

This topic addresses: how to define exit conditions, how to implement them in code across major frameworks, and how to handle the edge cases that make early termination tricky in practice.


Taxonomy of Exit Conditions

Exit conditions fall into four categories, each serving a different purpose:

1. Success Conditions

The agent has achieved the specified goal. This should be the most common exit path.

Examples:
- All files in the target list have been processed
- The test suite passes with >= 90% coverage
- A valid JSON artifact matching the output schema has been produced
- The generated code compiles and all existing tests still pass

2. Convergence Conditions

The agent is making diminishing returns — each new iteration produces less new information or smaller changes than the previous.

Examples:
- Consecutive iterations produce outputs that differ by less than N%
- The agent has proposed and rejected the same approach twice
- The last 3 tool calls returned no new information

3. Budget Conditions

Hard limits on token consumption, iteration count, or time. These are safety nets that prevent runaway agents.

Examples:
- Maximum N iterations reached
- Context window at 80% capacity
- Total tokens consumed > X
- Wall clock time > Y minutes

4. Failure Conditions

The agent has encountered an unrecoverable state. Continuing would not help; escalation is needed.

Examples:
- Tool has returned the same error 3 consecutive times
- Required input resource is unavailable
- Agent output cannot be parsed for N consecutive iterations
- Agent is generating content that contradicts its own prior output (diverging)

Tip: For every agentic task you design, write out exit conditions in all four categories before writing the system prompt or tool definitions. This is the equivalent of defining acceptance criteria before writing code — it clarifies what "done" means, prevents gold-plating, and gives you the vocabulary to implement programmatic stop signals.


Implementing Exit Conditions: From Concept to Code

Method 1: Stop Signal in Agent Output

The simplest approach is instructing the agent to include a structured stop signal in its output when it believes the task is complete. The orchestrator checks for this signal after each iteration.

SYSTEM_PROMPT = """
You are a code review agent. When you have reviewed all files and produced
your complete findings report, end your response with exactly:
REVIEW_COMPLETE: {"status": "success", "files_reviewed": N, "issues_found": M}

Do not continue after producing REVIEW_COMPLETE. Do not add caveats or 
offer to do additional work after this signal.
"""

def run_agent_loop(initial_message: str, max_iterations: int = 20) -> dict:
    messages = [SystemMessage(content=SYSTEM_PROMPT)]
    messages.append(HumanMessage(content=initial_message))

    for iteration in range(max_iterations):
        response = llm.invoke(messages, tools=TOOLS)
        messages.append(response)

        # Check for success exit condition
        if "REVIEW_COMPLETE:" in response.content:
            signal = parse_stop_signal(response.content, "REVIEW_COMPLETE")
            return {"status": "success", "signal": signal, "iterations": iteration + 1}

        # Check for budget exit condition
        total_tokens = count_tokens_in_messages(messages)
        if total_tokens > TOKEN_BUDGET * 0.85:
            return {"status": "budget_exceeded", "iterations": iteration + 1,
                    "partial_result": extract_partial_result(messages)}

        # Process tool calls if present
        if response.tool_calls:
            tool_results = execute_tool_calls(response.tool_calls)
            messages.extend(tool_results)
        else:
            # No tool calls and no stop signal — agent appears stuck
            return {"status": "stalled", "iterations": iteration + 1}

    return {"status": "max_iterations_reached", "iterations": max_iterations}

Method 2: Convergence Detection

Track the "delta" between consecutive iterations to detect when an agent is no longer making progress.

import difflib
from collections import deque

class ConvergenceDetector:
    def __init__(self, window_size: int = 3, threshold: float = 0.05):
        self.window = deque(maxlen=window_size)
        self.threshold = threshold  # 5% change threshold

    def record(self, iteration_output: str) -> None:
        self.window.append(iteration_output)

    def is_converged(self) -> bool:
        if len(self.window) < 2:
            return False

        # Compare last two outputs
        a, b = self.window[-2], self.window[-1]
        matcher = difflib.SequenceMatcher(None, a, b)
        similarity = matcher.ratio()
        change_ratio = 1.0 - similarity

        return change_ratio < self.threshold

    def is_oscillating(self) -> bool:
        """Detect if the agent is cycling between the same outputs."""
        if len(self.window) < 3:
            return False
        # Check if oldest and newest outputs are more similar than consecutive outputs
        oldest, _, newest = list(self.window)
        matcher_oe = difflib.SequenceMatcher(None, oldest, newest)
        return matcher_oe.ratio() > 0.85

detector = ConvergenceDetector(window_size=3, threshold=0.05)

for iteration in range(max_iterations):
    response = llm.invoke(messages, tools=TOOLS)

    detector.record(response.content)

    if detector.is_converged():
        print(f"Agent converged at iteration {iteration + 1} — exiting")
        break

    if detector.is_oscillating():
        print(f"Agent oscillating — breaking cycle at iteration {iteration + 1}")
        break

Method 3: Tool-Call-Based Exit Triggers

For agents that primarily operate through tool calls, define a dedicated "finish" tool that the agent must call to signal completion. This is cleaner than parsing text output for stop signals.

FINISH_TOOL = {
    "name": "task_complete",
    "description": "Call this tool when you have fully completed the assigned task. "
                   "You MUST call this tool to signal completion — do not just stop responding.",
    "parameters": {
        "type": "object",
        "properties": {
            "summary": {
                "type": "string",
                "description": "One-paragraph summary of what was accomplished"
            },
            "artifacts": {
                "type": "array",
                "items": {"type": "string"},
                "description": "List of files created or modified"
            },
            "confidence": {
                "type": "number",
                "description": "Confidence that task is fully complete (0.0-1.0)"
            }
        },
        "required": ["summary", "artifacts", "confidence"]
    }
}

def handle_tool_call(tool_name: str, tool_args: dict) -> tuple[str, bool]:
    """Returns (result, should_terminate)."""
    if tool_name == "task_complete":
        confidence = tool_args.get("confidence", 0)
        if confidence >= 0.8:
            return "Task marked complete.", True  # EXIT SIGNAL
        else:
            # Agent isn't confident — provide guidance and continue
            return (
                f"You reported low confidence ({confidence}). "
                "Identify what specific uncertainty remains and address it.",
                False
            )

    # Handle other tools normally
    result = execute_tool(tool_name, tool_args)
    return result, False

for iteration in range(max_iterations):
    response = llm.invoke(messages, tools=[FINISH_TOOL, *DOMAIN_TOOLS])

    should_terminate = False
    for tool_call in response.tool_calls:
        result, should_terminate = handle_tool_call(
            tool_call.name, tool_call.args
        )
        if should_terminate:
            break

    if should_terminate:
        break

Tip: Prefer tool-based exit signals over text-based stop signals for agents that use function calling. Text-based signals can fail if the model decides to include the stop phrase mid-response or paraphrase it. A structured tool call is unambiguous and parseable. The task_complete tool pattern also works well because it forces the model to explicitly commit to a confidence level — low confidence results are a useful signal for human review queues.


Framework-Specific Early Termination

LangGraph: Conditional Edges and State-Based Exits

LangGraph's conditional edges are the natural home for exit logic.

from langgraph.graph import StateGraph, END
from typing import TypedDict, Literal

class AgentState(TypedDict):
    messages: list
    iterations: int
    token_count: int
    exit_reason: str | None

def should_continue(state: AgentState) -> Literal["continue", "end"]:
    """Route to END or continue based on exit conditions."""

    # Budget condition: 80% of context window
    if state["token_count"] > MAX_TOKENS * 0.8:
        state["exit_reason"] = "budget_limit"
        return "end"

    # Iteration budget
    if state["iterations"] >= MAX_ITERATIONS:
        state["exit_reason"] = "max_iterations"
        return "end"

    # Success condition: check last message for finish tool call
    last_message = state["messages"][-1]
    if hasattr(last_message, "tool_calls"):
        for tc in last_message.tool_calls:
            if tc["name"] == "task_complete":
                state["exit_reason"] = "success"
                return "end"

    # Check for stall: last N messages contain no tool calls
    recent_messages = state["messages"][-4:]
    has_tool_activity = any(
        hasattr(m, "tool_calls") and m.tool_calls 
        for m in recent_messages
    )
    if not has_tool_activity and state["iterations"] > 3:
        state["exit_reason"] = "stalled"
        return "end"

    return "continue"

builder = StateGraph(AgentState)
builder.add_node("agent", agent_node)
builder.add_node("tools", tool_node)
builder.set_entry_point("agent")

builder.add_conditional_edges(
    "agent",
    should_continue,
    {"continue": "tools", "end": END}
)
builder.add_edge("tools", "agent")

CrewAI: Max Iterations and Task Callbacks

from crewai import Agent, Task, Crew
from crewai.utilities import TaskOutputHandler

focused_agent = Agent(
    role="Code Analyzer",
    goal="Analyze the provided code and return findings",
    backstory="Expert code reviewer",
    max_iter=8,        # Hard iteration cap
    max_rpm=10,        # Rate limit — also prevents runaway loops
    verbose=False
)

def task_callback(output):
    if "INSUFFICIENT_DATA" in output.raw_output:
        raise ValueError("Agent cannot complete task with provided context")
    # Log token usage for monitoring
    log_token_usage(output.token_usage)

analysis_task = Task(
    description="Analyze the code for issues. Call task_complete when done.",
    agent=focused_agent,
    expected_output="JSON findings array",
    callback=task_callback
)

OpenAI Assistants API: Run Control

from openai import OpenAI
import time

client = OpenAI()

def run_with_exit_control(
    assistant_id: str, 
    thread_id: str,
    max_steps: int = 15
) -> dict:
    run = client.beta.threads.runs.create(
        thread_id=thread_id,
        assistant_id=assistant_id,
        max_prompt_tokens=50000,    # Hard token budget
        max_completion_tokens=4000,
        truncation_strategy={
            "type": "last_messages",  # Keep recent context if truncation needed
            "last_messages": 10
        }
    )

    steps_taken = 0
    while run.status in ("queued", "in_progress", "requires_action"):
        time.sleep(0.5)
        run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run.id)

        steps_taken += 1
        if steps_taken > max_steps:
            # Cancel the run — don't let it continue burning tokens
            client.beta.threads.runs.cancel(thread_id=thread_id, run_id=run.id)
            return {"status": "cancelled", "reason": "max_steps"}

        if run.status == "requires_action":
            # Handle tool calls; check for task_complete tool
            tool_outputs = []
            for tc in run.required_action.submit_tool_outputs.tool_calls:
                if tc.function.name == "task_complete":
                    return {"status": "success", "steps": steps_taken}
                output = execute_tool(tc.function.name, tc.function.arguments)
                tool_outputs.append({"tool_call_id": tc.id, "output": output})

            run = client.beta.threads.runs.submit_tool_outputs(
                thread_id=thread_id,
                run_id=run.id,
                tool_outputs=tool_outputs
            )

    return {"status": run.status, "steps": steps_taken}

Tip: In the OpenAI Assistants API, always set max_prompt_tokens and max_completion_tokens explicitly. Without these, a long-running run can silently consume very large amounts of tokens. The truncation_strategy parameter is equally important: set it to last_messages to ensure that if truncation is needed, the most recent (and typically most relevant) context is preserved rather than the oldest.


Handling Partial Results on Early Termination

When an agent is stopped by a budget or iteration limit (not a success condition), you need a strategy for handling the partial result. Discarding all work done so far wastes the tokens already spent.

def extract_partial_result(messages: list, exit_reason: str) -> dict:
    """Extract whatever useful output exists when an agent is terminated early."""

    # Find the most recent structured output (tool call results, JSON blocks)
    structured_outputs = []
    for msg in reversed(messages):
        if hasattr(msg, "content") and msg.content:
            json_blocks = extract_json_blocks(msg.content)
            if json_blocks:
                structured_outputs.extend(json_blocks)
                break  # Take only the most recent

    # Find what tasks were completed vs. pending
    completed_tasks = [
        msg for msg in messages 
        if is_task_completion_marker(msg)
    ]

    return {
        "exit_reason": exit_reason,
        "partial_outputs": structured_outputs,
        "tasks_completed": len(completed_tasks),
        "resumption_hint": generate_resumption_context(messages)
    }

def generate_resumption_context(messages: list) -> str:
    """Produce a compact summary for resuming the task in a new session."""
    # This is a new LLM call to summarize what was done
    # Worth ~500 tokens to save re-running completed work
    summary_response = llm.invoke([
        SystemMessage(content="Summarize what has been accomplished and what remains."),
        *messages[-6:]  # Last 6 messages capture recent state
    ])
    return summary_response.content

The resumption_hint is particularly valuable: it is a compact context artifact (usually 200–500 tokens) that allows a new agent session to pick up where the terminated session left off, without re-doing work or re-reading already-processed data.

Tip: For QA engineers managing long-running test automation agents: always extract and store partial results before terminating an agent, even on failure. A test agent that was terminated after completing 80% of a test suite still produced 80% of valuable output. Structure your test agent to emit incremental structured results (individual test outcomes to a results store) rather than accumulating all results for a single final output. This way, early termination is never a total loss.


Proactive vs. Reactive Exit Condition Design

Reactive exit conditions detect when the agent has already gone off track: it has exceeded its budget, it is oscillating, it has stalled. These are safety nets.

Proactive exit conditions prevent the agent from going off track in the first place. They are built into the task specification:

Reactive (safety net):
"If iterations > 20, terminate"

Proactive (built into task):
"Your task is to analyze exactly 3 files: auth.py, session.py, and models.py.
You have all required information. Analyze each file in order, then call 
task_complete. Do not read any additional files or request additional context."

Proactive design eliminates the most common cause of runaway agents: the agent discovers it "needs" more context, reads something new, which suggests more new things to read, and the loop expands. By pre-specifying the exact scope, you close this expansion path.

Tip: Write your task specifications with explicit scope boundaries: "You have been given exactly the information you need," "Do not search for additional resources," "If you encounter something outside your provided context, note it as a gap in your output rather than attempting to fill it." This language shifts the agent from an exploratory to an execution mindset, dramatically reducing unnecessary tool calls and their associated token costs.


Summary

Early termination is a first-class optimization concern, not an afterthought. The four categories of exit conditions — success, convergence, budget, and failure — cover all cases where an agent should stop running. Tool-based exit signals (like a task_complete tool) are more reliable than text-based signals. Framework-specific implementations exist for LangGraph (conditional edges), CrewAI (max_iter), and OpenAI Assistants (run-level token limits).

Partial result extraction ensures that tokens spent before an early termination are not wasted. Proactive scope specification in task prompts is the highest-leverage intervention — it prevents the most common cause of runaway iterations before they start.

Combined with task decomposition and context scoping, early termination gives you control over agentic loop depth and cost across all phases of the plan-execute-verify cycle.