Knowing When to Iterate vs Reject AI Code

The most expensive decision in an agentic session is not making a wrong call — it is refusing to make any call and continuing to patch code that was broken from the start.

The Sunk-Cost Trap in Agentic Sessions

You have been working with an AI agent for 45 minutes. There is a function that almost works. The last three prompts have each gotten it closer, or so it seems. The logic is 80% right. One more targeted fix and you will be done.

This is the sunk-cost trap, and it is more powerful in agentic sessions than in almost any other engineering context. The trap works because the AI produces plausible output at every iteration — it always looks like progress. Unlike a compile error that tells you unambiguously that something is wrong, a partially correct AI output gives you enough signal to believe you are close. You might be. Or you might have been ten minutes away from close for the last half hour.

The trap is reinforced by the effort you have already invested: the carefully constructed prompts, the context you have built up in the session, the incremental gains that feel real even when the fundamental structure is wrong. Abandoning that feels wasteful. But the calculation is not "what did I spend?" — it is "what will it cost to finish from here versus starting clean?"

Experienced engineers who work with AI tooling develop an instinct for when a session has gone stale. This topic makes that instinct explicit and teachable.

Learning tip: Set a "patch limit" before you start any AI-assisted task. If you have made more than three corrective prompts on the same piece of logic and it is still wrong, treat that as a trigger to evaluate whether you are iterating toward a solution or around a fundamental problem.

Signals That Indicate Iteration Is Futile

Some problems cannot be fixed by asking the AI to try again. Recognizing them early is the skill.

Wrong architecture. The AI generated a solution that technically works for the provided example but is structurally incompatible with the actual requirements. A common pattern: you asked for a stateless function and the AI generated a class with internal state. Or you asked for a streaming response and the AI buffered everything in memory. These structural decisions ripple through the entire implementation — fixing them means rewriting, not patching.

Misunderstood spec. The AI interpreted a key requirement differently from how you intended it, and every line of the generated code reflects that interpretation. You can tell this has happened when fixing one behavior breaks another — the code is internally consistent, but it is consistently implementing the wrong thing. Targeted patches in this situation are like trying to navigate using a map of the wrong city.

Context poisoning. Long agentic sessions accumulate context: previous failed attempts, corrective instructions that contradict each other, code snippets from approaches that were abandoned. The model is now reasoning about a tangled history rather than a clear specification. Outputs become inconsistent — the model forgets constraints that were established earlier or reintroduces code that was explicitly removed. This is often the hardest signal to recognize because the symptoms look like normal bugs rather than a session-level problem.

Fundamental logic error. The AI has implemented an algorithm that is wrong for the problem class — using a greedy approach where dynamic programming is required, or building a linear scan where the spec implies real-time update notifications. Patching the implementation cannot fix an algorithm that is the wrong tool for the job.

Learning tip: When you suspect a structural issue, ask the AI to describe its overall approach before you ask it to fix any specific behavior. If the description reveals a fundamental mismatch with the spec, you have confirmed that iteration is futile. This question takes 30 seconds and can save 30 minutes.

Signals That Indicate a Targeted Fix Will Work

Not every problem warrants a restart. Recognizing fixable problems is just as important as recognizing unfixable ones.

Isolated bug in correct structure. The overall architecture is right, the spec is correctly understood, and a single function or code path has a bug. You can point to exactly one place where the behavior diverges from expectations. This is a clear iterate signal — the session context is valuable, and a targeted fix is likely to succeed.

Wrong detail, correct intent. The AI used the wrong field name from an API, or referenced a method with a slightly different signature, or returned a status code that is close but not correct. The underlying logic is right; the surface details need adjustment. These fixes are mechanical and low-risk.

Missing case. The implementation handles the cases that were in the prompt examples but is missing a case you did not explicitly specify. Adding the missing case does not require restructuring anything. Iteration adds the case; it does not replace the existing work.

Test failure with a clear cause. A test is failing, and the failure message points unambiguously to a specific line or condition. The fix is knowable without a full trace of the codebase. This is the ideal iteration scenario — the problem is bounded and the solution is targeted.

Learning tip: Before deciding to iterate, write down in one sentence what specifically is wrong and where it is. If you can write that sentence, iteration is likely appropriate. If you cannot — if the wrongness is diffuse or hard to pin down — that is a signal to consider a restart.

The Restart Budget Mental Model

The restart budget is a simple mental model for making the iterate-vs-restart decision explicit. The idea is to estimate two costs before committing to either path.

Cost of continuing to iterate: How many more prompts do you estimate it will take to fix the current implementation? Multiply by the average time per prompt cycle (including your review time). Add a factor for the likelihood that fixing the current problems reveals new ones. If the architecture is wrong or the spec was misunderstood, that factor is high — fixing one thing tends to uncover two more.

Cost of a clean restart: How long will it take to write a better initial prompt? Add the time to generate and review a fresh implementation. Subtract the time you would have spent debugging the current one. In most cases, a clean restart from a well-specified prompt takes 15–30 minutes for a bounded task. If the current session has already consumed 45 minutes of debugging with no end in sight, the restart budget is almost always favorable.

The restart budget also has a psychological value: it makes the decision feel rational rather than like giving up. You are not abandoning work — you are applying a cost model and choosing the faster path to a working result.

Learning tip: Track your iterate-vs-restart decisions for two weeks. Note the estimated cost, the actual cost, and which decision you made. Most engineers find that they underestimated restart cost early in their AI tooling practice and overestimated it after they got comfortable writing good initial prompts.

How to Do a Clean Restart Without Losing Good Context

A clean restart does not mean throwing everything away. It means discarding the output while preserving the understanding.

Save the specification, discard the code. Before closing the session, write down (or copy into a new document) what you have learned about the requirements. Include: the constraints that matter, the edge cases you discovered during debugging, the approach that failed and why, and what a correct approach would need to do differently. This is your distilled spec — it is more valuable than anything the AI generated.

Extract any parts that are genuinely correct. If the session produced a utility function or a data structure definition that is solid, copy it out before restarting. Do not carry over anything you are uncertain about — the goal of the restart is a clean foundation.

Write a better initial prompt. Use everything you learned in the failed session to write a prompt that front-loads the constraints and edge cases that tripped up the previous attempt. Explicitly tell the AI what approach to avoid if you have a strong reason to avoid it.

Start a new conversation or context window. Do not try to repair a poisoned context by adding more instructions. Start fresh. The new session will reason about your prompt without the accumulated noise of the old one.

Learning tip: A good restart prompt is measurably longer and more specific than the prompt that started the failed session. If your restart prompt looks about the same length as the original, you probably have not captured the learning from the session.

Team Communication Around AI Task Restarts

When AI-assisted work is happening on a team, the restart decision has a social dimension. Engineers may be reluctant to report that a session was discarded, particularly if they have been working on it for a while and it looks like a productivity failure.

Establish shared vocabulary. "I restarted the session" should be a neutral technical statement, not a confession. Teams that treat restarts as normal practice — rather than as failures — make better decisions about when to restart because individuals are not managing personal risk when they make the call.

Normalize the restart log. When a restart happens, the engineer documents it briefly: what the failed approach was, why it failed, and what the restart prompt looked like. This is not a post-mortem — it is a one-paragraph note in the task thread. Over time, these notes become a team knowledge base about what kinds of AI prompting patterns fail for your specific codebase and problem domain.

Distinguish between "the AI failed" and "we gave the AI a bad spec." Both happen. The second one is more common and more actionable. If the session failed because the spec was ambiguous, the fix is clarifying the spec — not just restarting with the same ambiguous prompt.

Learning tip: If multiple engineers on your team have restarted sessions on similar types of tasks, that is a signal worth discussing in a retro. It usually points to a pattern: a category of task that needs a different prompting strategy, a missing shared context (like a style guide or a data model document), or a task that genuinely needs human design work before AI can implement it effectively.

Hands-On: Making the Iterate-vs-Restart Decision

Step 1: Create a scenario to evaluate

Start with a deliberately ambiguous prompt and observe where the session goes:

Build a Node.js function that processes user uploads. It should validate the file, store it, and notify the user. Use whatever structure makes sense.

Let the AI generate an implementation. Do not give feedback yet.

Step 2: Diagnose the output against a revealed spec

Now reveal more specific requirements and evaluate whether the current output can be salvaged:

Here are the actual requirements I had in mind:
- Files must be under 10MB; reject larger files with a 413 status
- Only allow JPEG, PNG, and PDF; reject others with a 415 status
- Store files in an S3 bucket with the key format: `uploads/{userId}/{timestamp}-{originalFilename}`
- Send an email notification (not a push notification) via SendGrid after successful upload
- The entire flow should be wrapped in a transaction log: record start, success or failure, and duration

Looking at what you generated, identify which requirements you correctly anticipated and which ones will require structural changes to accommodate.

Expected result: The AI will either identify significant structural mismatches (indicating a restart) or confirm that the existing code needs targeted additions (indicating iteration).

Step 3: Apply the restart budget

Given the gap between what you generated and the actual requirements, estimate:
1. How many significant changes are needed to the current implementation?
2. Are any of these changes architectural (would require restructuring the whole function)?
3. If you were to start fresh with the full specification, how would your initial structure differ from what you generated?

Based on this, should I iterate on the current code or start with a fresh implementation?

Step 4: If restarting — write the distilled spec

Based on the analysis, write a clean restart prompt that incorporates everything learned:

Implement a Node.js async function `processUserUpload(file: Express.Multer.File, userId: string): Promise<UploadResult>` with these requirements:

Validation (run first, before any storage):
- Reject files over 10MB with error code FILE_TOO_LARGE
- Reject files that are not JPEG, PNG, or PDF with error code INVALID_FILE_TYPE

Storage:
- Store to S3 with key: `uploads/{userId}/{Date.now()}-{file.originalname}`
- Use the AWS SDK v3 PutObjectCommand
- Return the S3 URL on success

Notification:
- After successful storage, send an email via SendGrid to the user's email (fetched from a `getUserEmail(userId)` function you can assume exists)
- If the email fails, log the error but do NOT fail the upload

Transaction log:
- Record start, outcome (success/failure), duration in ms, and error code if applicable
- Use a `logUploadEvent(event: UploadEvent)` function you can assume exists

Return type UploadResult: `{ success: boolean; url?: string; errorCode?: string }`

Step 5: Compare the two implementations

Review both implementations side by side and note:
- Which structural choices changed between the two?
- Did the restart prompt produce cleaner code on the first attempt?
- What did you learn about how to spec this kind of function?

Step 6: Document the restart

Write a one-paragraph note that could go into a team's shared knowledge base:

Write a brief team note (3-5 sentences) summarizing: what the initial prompt produced, why a restart was necessary, and what the key lesson is for prompting file upload processing functions in the future.

Key Takeaways

The sunk-cost trap in agentic sessions is reinforced by the plausibility of AI output at every iteration — it always looks like progress even when the fundamental direction is wrong.
Wrong architecture, misunderstood spec, and context poisoning are restart signals; isolated bugs, wrong details, and missing cases are iteration signals.
The restart budget model makes the decision rational: estimate the cost of fixing the current implementation versus the cost of a clean restart with a better spec, and choose the lower cost.
A clean restart means discarding the generated code while preserving the specification knowledge — the most valuable output of a debugging session is the distilled understanding of the requirements.
Teams that normalize restarts and document the reasons make better prompting decisions over time because they build a collective knowledge base about what prompt patterns fail for their specific problem domain.