·

The agentic QA loop

The agentic QA loop

What does the full agentic QA loop look like from spec to shipped?

The agentic QA loop is not a linear checklist — it is a continuous, feedback-driven cycle that runs from the moment a feature is defined to the moment it is deployed and monitored in production. Understanding the full shape of this loop before diving into individual steps is critical because it reframes how you think about your role as a QA engineer. You are no longer executing a sequence of manual steps; you are orchestrating a system of AI-assisted processes that self-reinforce.

Here is the loop in its complete form:

┌─────────────────────────────────────────────────────────────────────┐
│                     THE AGENTIC QA LOOP                             │
│                                                                     │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐     │
│  │  SPEC /  │───▶│   TEST   │───▶│ GENERATE │───▶│ EXECUTE  │     │
│  │  CHANGE  │    │ PLANNING │    │  TESTS   │    │  TESTS   │     │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘     │
│        ▲                                               │            │
│        │                                               ▼            │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐     │
│  │  MERGE / │◀───│ COVERAGE │◀───│  BUG /   │◀───│EXPLORATORY│    │
│  │  SHIP    │    │  REPORT  │    │  TRIAGE  │    │ TESTING  │     │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘     │
│                                                                     │
│  Feedback arrows: every output feeds back into context for         │
│  next iteration                                                     │
└─────────────────────────────────────────────────────────────────────┘

Stage 1 — Spec/Change Input: The loop starts when context arrives. This is either a product spec (user stories, acceptance criteria, design docs), a code change (a PR diff, a feature branch, a commit range), or both. The agent ingests both and holds them as its working context for the entire loop.

Stage 2 — Agentic Test Planning: Given the spec and change context, the agent generates a structured test plan: risk areas, coverage targets, test levels, and scope boundaries. This is not a placeholder — it is a full, prioritized test plan that gates what happens next.

Stage 3 — Test Generation: The approved test plan drives two parallel generation tracks. Manual test cases for scenarios that require human judgment, accessibility, or exploratory intent. Automated test scripts (E2E and API) for scenarios with clear acceptance criteria and repeatable paths. Both generation tracks use the same spec + code context.

Stage 4 — Test Execution: Generated automated tests run immediately against the feature branch. A CI job picks them up, executes them, and returns a results payload: pass/fail counts, coverage delta, and flakiness signals. Manual tests go into the test management tool for execution by the QA team.

Stage 5 — Exploratory Testing: After scripted tests execute, exploratory testing begins. The agent generates risk-based charters using the code diff as its primary signal — where has the code changed most heavily? What paths are adjacent to the change but untested? A QA engineer runs these charters and feeds findings back into the loop.

Stage 6 — Bug and Triage: Every failure from automated execution and every finding from exploratory testing flows into an AI-assisted triage queue. The agent classifies each issue: new vs. pre-existing, regression vs. new defect, critical vs. low-risk. Structured bug reports are generated automatically from raw findings.

Stage 7 — Coverage Report: The agent consolidates all outputs into a coverage snapshot: what was planned, what was executed, what passed, what failed, what was found in exploration, and what remains untested. This report is the go/no-go signal for shipping.

Stage 8 — Merge/Ship: The team reviews the coverage report and ships or holds. The coverage report and all generated artifacts are committed back to the repository, closing the loop and enriching the context for the next iteration.

The critical insight is the feedback arrows. Every output from every stage feeds back into the context pool that drives the next iteration. A bug found in exploratory testing updates the test plan. A new failure in CI updates the risk assessment. A coverage gap identified in the report creates a new test generation task. This feedback structure is what distinguishes the agentic loop from a traditional QA pipeline.

Learning Tip: Before you run the agentic QA loop for the first time on a real feature, draw this diagram on a whiteboard with your team and map each stage to who owns the review gate. The loop is autonomous in execution but human-gated in decision-making. Knowing which stages need a QA engineer's sign-off before proceeding prevents the loop from running on autopilot in a way that erodes trust.


How does each prior module feed into the unified agentic workflow?

Each module you have studied up to this point is not a standalone topic — it is a component that plugs into a specific stage of the agentic QA loop. Understanding these connections is what transforms you from a practitioner of individual AI-assisted techniques into an orchestrator of the full workflow.

Module 1 (Foundations) → Environment Stage: The setup work from Module 1 — Claude Code installed, Gemini configured, CI integration wired — is the foundation that makes autonomous execution possible. Without a properly configured environment, the loop cannot run unattended.

Module 2 (Context Engineering) → Every Stage: Context engineering is the cross-cutting discipline that determines output quality at every stage. The CLAUDE.md configuration, the prompt architecture patterns, and the dual-context model all directly control how accurately the agent performs at each step. This is the module you will return to most often when the loop produces poor outputs.

Module 3 (Test Strategy and Planning) → Stage 2: The risk-based planning techniques, coverage goal definitions, and test level mapping from Module 3 are precisely what the agent executes at the planning stage. The prompts you built in Module 3 become the first-stage prompts in the agentic loop.

Module 4 (Manual Test Case Generation) → Stage 3, Manual Track: The chaining technique of feeding test plan output into manual test case generation is the Module 4 workflow running inside the larger agentic loop. The context templates and review checklists from Module 4 apply directly.

Module 5 (E2E and API Test Generation) → Stage 3, Automated Track: Every technique in Module 5 — Page Object Model context, Playwright script generation, API test generation from OpenAPI specs — runs inside Stage 3 of the loop. The CLAUDE.md configurations that make generated tests runnable are the same ones used in autonomous CI execution.

Module 6 (Bug Analysis and Reporting) → Stage 6: The structured bug analysis prompts, log context trimming techniques, and bug report templates from Module 6 are the engine of the triage stage. When the agentic loop surfaces failures, it applies Module 6 techniques to produce developer-ready bug reports automatically.

Module 7 (Exploratory Testing) → Stage 5: The charter generation prompts, risk-based exploration techniques, and session synthesis workflows from Module 7 run as Stage 5 of the agentic loop. The code-change-first charter generation is the default approach when the loop drives exploration.

Here is the full mapping as a reference table:

Module Core Contribution Agentic Loop Stage
1 — Foundations Environment setup, tool configuration All stages (foundation)
2 — Context Engineering Prompt architecture, dual-context model All stages (quality control)
3 — Test Strategy Risk assessment, scope definition, coverage goals Stage 2: Test Planning
4 — Manual Generation Test case templates, review checklists Stage 3: Manual Track
5 — E2E/API Generation Script generation, Page Object context Stage 3: Automated Track
6 — Bug Analysis Triage prompts, report templates, log analysis Stage 6: Bug Triage
7 — Exploratory Testing Charter generation, session synthesis Stage 5: Exploration

The practical implication of this mapping is that your prior investment in building good prompts, context files, and CLAUDE.md configurations is not wasted when you move to the agentic loop. You are assembling existing components into an orchestrated workflow — not starting over.

Learning Tip: Create a reference document in your qa-context/ folder called agentic-loop-map.md that maps your team's specific prompt files, slash commands, and CLAUDE.md configurations to each stage of the loop. When a stage produces poor output, you can immediately identify which context component is the culprit and fix it without having to diagnose the full loop.


What are the entry points — spec-first, code-change-first, or dual-context?

Real-world QA work does not always start from the same place. Sometimes you have a beautifully written spec with complete acceptance criteria but no code yet. Sometimes you have a PR diff with no spec. Sometimes you have both. The agentic loop needs to handle all three entry points gracefully, and your prompts need to be designed to work under each condition.

Entry Point 1: Spec-First

When to use: New features being designed before development starts. Pre-sprint test planning sessions. Acceptance criteria review as part of story grooming.

What you have: User stories, acceptance criteria, design specs, wireframes, API contract drafts.

What you don't have: Working code, a PR diff, or any existing test coverage for this feature.

Prompt pattern:

You are a senior QA engineer performing early test planning.

SPEC INPUT:
---
[paste full spec here]
---

Your task:
1. Extract all testable behaviors from this spec
2. Identify ambiguous acceptance criteria that need clarification before testing
3. Define the test scope: what is in scope vs. explicitly out of scope
4. Generate a risk-prioritized list of test scenarios (HIGH / MEDIUM / LOW)
5. Recommend test levels for each scenario: E2E, API, unit, manual, exploratory

Output as a structured test plan. Flag any spec gaps that would block testing.

Limitation: Without code context, the agent cannot reason about implementation risks, framework constraints, or existing coverage. The test plan will be spec-accurate but may miss implementation-level edge cases that only become visible once code exists.

Entry Point 2: Code-Change-First

When to use: Hotfixes with no time for formal spec. Refactors where behavior should stay the same. PRs from external contributors. Bug fix validation.

What you have: A PR diff or git commit range. Possibly a PR description.

What you don't have: A formal spec or user stories.

Prompt pattern:

git diff origin/main...HEAD > feature-diff.txt

claude --print "
You are a QA engineer reviewing a code change for test coverage.

CODE CHANGE:
---
$(cat feature-diff.txt)
---

Your task:
1. Summarize what this code change does functionally
2. Identify the most likely failure modes introduced by this change
3. List test scenarios that must pass before this change is safe to merge
4. Identify existing tests that cover these scenarios (look at the test files in this repo)
5. Flag any scenarios with no current test coverage

Focus on behavioral regression risk — what existing behavior could this change break?
"

Limitation: Without a spec, the agent infers intent from code. It can miss intentional behavior changes that the spec would have described explicitly. It is also blind to business rules that live outside the code (pricing logic, compliance requirements, feature flags).

Entry Point 3: Dual-Context (Recommended)

When to use: Standard sprint workflow for new features or significant enhancements. Any situation where both spec and code change are available.

What you have: Everything. Spec, acceptance criteria, PR diff, and existing codebase context.

Why it is the best entry point: The spec tells the agent what the code is supposed to do. The diff tells it what actually changed. Cross-referencing both allows the agent to detect alignment gaps — where the implementation deviates from the spec — before testing even begins.

Prompt pattern:

cat spec/feature-checkout-redesign.md > combined-context.txt
echo "\n\n---CODE CHANGE---\n" >> combined-context.txt
git diff origin/main...HEAD >> combined-context.txt

claude --print "
You are a senior QA engineer with full feature context.

COMBINED CONTEXT (spec + code change):
---
$(cat combined-context.txt)
---

Your task:
1. Identify any misalignments between spec intent and implementation
2. Generate a complete test plan: risk areas, coverage goals, test levels
3. Separate scenarios into: must-automate, should-manual-test, and explore
4. Flag spec sections that are not reflected in the code change
5. Flag code changes that have no corresponding spec coverage

Output a structured test plan with an alignment gap section at the top.
"

The alignment gap section is the unique value of dual-context. It surfaces issues that neither spec-first nor code-first can find alone — moments where the developer implemented something different from what was specified, or where spec requirements have no corresponding code change.

Learning Tip: Train your team to always capture the diff before starting a QA session on any PR, even if you plan to work spec-first. The marginal cost of capturing a diff is seconds, and having it available means you can upgrade from spec-first to dual-context at any point during the session. Running a quick git diff origin/main...HEAD > diff.txt as the first step of every QA session should become muscle memory.


What outputs does the agentic QA loop produce at each stage?

The agentic QA loop produces concrete, persisted artifacts at each stage — not just ephemeral AI responses. Understanding exactly what each stage outputs is critical for two reasons: it tells you what to review at each gate, and it defines the context that feeds into the next stage.

Stage 2 Output: The AI-Generated Test Plan

File: qa-artifacts/test-plan-{feature-slug}-{date}.md

Contents:
- Feature summary in QA terms
- Risk assessment with HIGH / MEDIUM / LOW classification
- Test scope: in-scope scenarios, out-of-scope scenarios
- Coverage goals: which paths must achieve 100% coverage, which are best-effort
- Test level mapping: scenario → E2E / API / manual / exploratory
- Spec alignment gaps (dual-context only)
- Estimated effort breakdown

Used by: Stage 3 (drives both manual and automated generation), Stage 5 (scopes exploratory charters), Stage 7 (defines coverage baseline for regression report)

Stage 3 Output: Test Cases and Automated Scripts

Manual test cases: qa-artifacts/manual-tests-{feature-slug}.md — Structured test cases with steps, expected results, and acceptance criteria references.

E2E scripts: tests/e2e/{feature-slug}.spec.ts — Runnable Playwright (or equivalent) scripts committed to the test suite.

API tests: tests/api/{feature-slug}.test.ts — Runnable API test scripts covering positive, negative, and boundary scenarios.

Used by: Stage 4 (CI executes the automated scripts), Stage 6 (failures from these scripts feed into triage)

Stage 4 Output: CI Execution Results

File: qa-artifacts/ci-results-{run-id}.json and a human-readable summary qa-artifacts/ci-summary-{run-id}.md

Contents:
- Total tests run, passed, failed, skipped
- Per-test failure details with stack traces
- Coverage delta vs. baseline
- Flakiness signals (any test that passed on retry)
- Execution time per suite

Used by: Stage 5 (coverage gaps in results scope exploratory charters), Stage 6 (failures feed into triage), Stage 7 (baseline for regression analysis)

Stage 5 Output: Exploratory Testing Findings

File: qa-artifacts/exploratory-findings-{session-id}.md

Contents:
- Charter(s) executed
- Areas explored, areas skipped
- Bugs found (raw descriptions, with reproduction notes)
- Observations not yet classified as bugs
- Suggested new test cases based on findings

Used by: Stage 6 (raw bugs flow into triage), Stage 7 (coverage report includes exploratory findings)

Stage 6 Output: Triage and Bug Reports

File: qa-artifacts/bugs-{feature-slug}-{date}.md (one per bug) and qa-artifacts/triage-summary-{date}.md

Contents per bug: Summary, steps to reproduce, expected vs. actual, environment, root cause hypothesis, suggested fix direction, severity classification

Contents of triage summary: Total bugs found, severity distribution, new vs. pre-existing classification, which bugs are blocking merge

Used by: Stage 7 (feeds into coverage report as open issues count), development team (receives individual bug reports)

Stage 7 Output: The Coverage Report

File: qa-artifacts/coverage-report-{feature-slug}-{date}.md

Contents:
- Test plan summary (what was planned)
- Execution summary (what was run, what passed/failed)
- Manual test execution status
- Exploratory testing summary
- Open bugs (blocking vs. non-blocking)
- Coverage delta vs. previous baseline
- Go / No-Go recommendation with reasoning
- Risks accepted if shipping now (open issues that are not blocking)

This is the primary artifact that the team uses to make the ship/hold decision. It is the agentic loop's final deliverable for each feature cycle.

Learning Tip: Create a qa-artifacts/ directory at the root of your project and commit all agentic loop outputs there. This builds a historical record of every test cycle that future AI sessions can use as context. When the agent runs the agentic loop on a related feature three months later, it can read the previous coverage reports and test plans as context — dramatically improving the relevance of its output and avoiding re-learning domain knowledge you already captured.