·

AI-assisted manual test case generation in the agentic loop

AI-assisted manual test case generation in the agentic loop

How to chain test plan output into manual test case generation automatically?

In traditional QA workflows, test planning and test case writing are separate activities with a context switch between them. A QA engineer finishes a test plan, sets it aside, and starts writing test cases from memory and notes. In the agentic loop, the test plan is the direct input to manual test case generation — there is no context switch, no information loss, and no re-interpretation of risk priorities that were already established.

Chaining test plan output into manual test case generation means using the approved test plan file as the primary input to the generation prompt. The agent reads the plan, identifies which scenarios are designated for manual testing, and generates structured test cases that inherit the plan's risk classifications, AC references, and coverage goals.

Setting Up the Chain

The chain relies on the test plan being a structured, machine-readable file. This is why the planning prompt from the previous topic outputs clearly labeled sections with scenario IDs, types, and risk levels — those labels are parsed by the generation prompt.


claude --print "
You are a QA engineer generating manual test cases from an approved test plan.

APPROVED TEST PLAN:
---
$(cat qa-artifacts/test-plan-checkout-validation-20240113.md)
---

SPEC CONTEXT:
---
$(cat /tmp/feature-spec.md)
---

CODEBASE CONTEXT:
Read CLAUDE.md and the relevant source files to understand the domain model,
test data conventions, and any existing test case formats in the test management tool.

YOUR TASK:
1. Extract all scenarios from the test plan with Type: Manual
2. For each manual scenario, generate a complete test case

Each test case must follow this format:

---
### TC-{PLAN_SCENARIO_ID}: {Title}
**Risk Level**: HIGH / MEDIUM / LOW
**AC Reference**: AC{N}, AC{N}
**Preconditions**: [list what must be true before this test starts]
**Test Data Required**: [specific data, not generic placeholders]
**Steps**:
1. [action]
   - Expected: [observable result]
2. [action]
   - Expected: [observable result]
[continue for all steps]
**Pass Criteria**: [the single most important thing that must be true for this test to pass]
**Fail Criteria**: [what you observe if the test fails]
**Notes**: [any edge cases, environment flags, or known issues to watch for]
---

Generate all manual test cases. Preserve risk ordering from the test plan (HIGH first).
" > qa-artifacts/manual-tests-checkout-validation-$(date +%Y%m%d).md

What Good Chained Output Looks Like

A well-chained generation produces test cases like this (partial example):

### TC-TS-003: Checkout submit blocked when zip code is invalid
**Risk Level**: HIGH
**AC Reference**: AC2, AC6
**Preconditions**:
- User is logged in with an active account
- User has at least one item in the cart
- User is on the checkout page, payment method already selected

**Test Data Required**:
- Valid product: SKU-1001 (available in test env)
- Invalid zip code: 00000 (known invalid in validation service)
- Valid remainder of address: 123 Test St, Boston, MA

**Steps**:
1. Navigate to /checkout
   - Expected: Checkout form displays with address section visible
2. Fill in address fields: street=123 Test St, city=Boston, state=MA
   - Expected: Fields accept input without error
3. Enter zip code: 00000 and tab out of the field (blur)
   - Expected: Red inline error appears below zip field: "Enter a valid ZIP code"
   - Expected: Error appears within 500ms of tab-out (time with stopwatch on first run)
4. Click "Place Order" button
   - Expected: Button does NOT submit the form
   - Expected: Zip code error remains visible
   - Expected: Focus moves to the zip code field
5. Correct zip code to 02101 and tab out
   - Expected: Error clears, green checkmark appears next to zip field
6. Click "Place Order" button
   - Expected: Form submits successfully, navigates to confirmation page

**Pass Criteria**: Form cannot be submitted with invalid zip code, and validation error is visible and specific.
**Fail Criteria**: Form submits with invalid zip code, OR error message text differs from spec, OR submission blocked for any reason other than validation error.
**Notes**: Run this test with network throttling enabled (Slow 3G) to validate the 500ms requirement under realistic conditions.

Notice that the test case inherits the AC references from the test plan, uses specific test data (not "enter a valid zip code"), and includes timing validation derived from AC7.

Running the Full Chaining Command in One Pass

For teams that want to minimize manual steps, the chain can be run as a single multi-stage command:

#!/bin/bash
FEATURE_SLUG=$1
DATE=$(date +%Y%m%d)
PLAN_FILE="qa-artifacts/test-plan-${FEATURE_SLUG}-latest.md"
OUTPUT_FILE="qa-artifacts/manual-tests-${FEATURE_SLUG}-${DATE}.md"

if [ ! -f "$PLAN_FILE" ]; then
  echo "Error: No test plan found at $PLAN_FILE"
  exit 1
fi

echo "Generating manual test cases from: $PLAN_FILE"

claude --print "
$(cat .claude/commands/generate-manual-tests.md)

TEST PLAN:
$(cat $PLAN_FILE)

SPEC:
$(cat qa-context/active-spec.md 2>/dev/null || echo 'No spec file found')
" > "$OUTPUT_FILE"

echo "Manual tests written to: $OUTPUT_FILE"
echo "Test cases generated: $(grep -c '^### TC-' $OUTPUT_FILE)"
chmod +x generate-manual-tests.sh
./generate-manual-tests.sh checkout-validation

Learning Tip: The most common failure in chaining is when the test plan uses inconsistent scenario IDs or type labels. If your test plan says "Type: Manual testing" in some places and "Type: Manual" in others, the generation prompt may miss cases. Standardize the label format in your planning prompt template and the chaining will be reliable. Add "Use exactly the labels: E2E, API, Manual, Exploratory — no variations" to your planning prompt's output instructions.


How to review, approve, and store manual cases within the agentic workflow?

Generating manual test cases is fast. Reviewing them rigorously is where your QA expertise is irreplaceable. The agentic loop does not remove the review step — it makes the review step faster by providing structured output that is easier to evaluate than writing cases from scratch.

The Manual Test Case Review Framework

Review generated test cases against four criteria:

Criterion 1 — Step Precision: Each step should describe exactly one observable action with exactly one expected result. Vague steps like "verify the address is validated" or "check for errors" fail this criterion. The step should describe what button to click, what field to fill, and what specific text or visual change should appear.

Criterion 2 — Test Data Specificity: Generic placeholders ("valid username", "test product") are not acceptable in production-ready test cases. Every test case should reference specific test data values that work in your test environment. If the agent uses placeholders, replace them with real values from your test data factory.

Criterion 3 — AC Traceability: Every test case must link back to at least one acceptance criterion. If a test case exists with no AC reference, either it covers an implicit requirement (which should be documented) or it is redundant and should be removed.

Criterion 4 — Environment Completeness: Preconditions must be specific enough that any QA engineer can set up the test state independently. "User is logged in" is insufficient. "User is logged in as a Member-role user with one item in cart (SKU-1001)" is sufficient.

Review Checklist (Copy-Paste Ready)

## Manual Test Case Review Checklist

For each test case:
- [ ] Steps are atomic: one action, one expected result per step
- [ ] Test data is specific: no generic placeholders, uses real test values
- [ ] AC reference is present and correct
- [ ] Preconditions can be set up by any QA engineer independently
- [ ] Pass criteria is unambiguous: pass/fail has no gray area
- [ ] Edge case handling is explicit (what happens on slow network, etc.)
- [ ] Mobile-specific steps are flagged where relevant
- [ ] Risk level matches the test plan classification

Review summary:
- Total cases generated: ___
- Cases approved as-is: ___
- Cases requiring minor edits: ___
- Cases requiring major revision: ___
- Cases rejected (reason): ___

Using AI to Assist the Review

The review itself can be AI-assisted. After generating the test cases, run a validation prompt:

claude --print "
You are a QA lead reviewing a set of generated manual test cases.

ACCEPTANCE CRITERIA (source of truth):
---
$(cat /tmp/feature-spec.md)
---

GENERATED TEST CASES:
---
$(cat qa-artifacts/manual-tests-checkout-validation-20240113.md)
---

Review each test case against these criteria:
1. Is every AC covered by at least one test case? List uncovered ACs.
2. Are there any test cases with generic placeholder test data? List them.
3. Are there any test cases where steps are ambiguous or multi-action? List them.
4. Are there any duplicate test cases (same scenario tested twice)? List them.
5. Are there any HIGH-risk ACs with only one test case (suggesting under-coverage)?

Output: a review report with specific case IDs flagged for each issue.
"

This produces a machine-generated review report that you use as a first-pass review, then do a final human review of the flagged items only. This approach cuts review time by 40–60% on a typical feature.

Storing Approved Cases in Your Test Management Tool

The final step is importing approved test cases into your test management platform. The format differs per tool:

For Jira + Xray:


claude --print "
Convert these manual test cases to Xray CSV import format.
Columns required: Summary, Description, Steps (JSON), Labels, Priority

INPUT:
$(cat qa-artifacts/manual-tests-checkout-validation-20240113.md)

Output valid CSV that can be directly imported to Xray.
" > qa-artifacts/manual-tests-xray-import-$(date +%Y%m%d).csv

For TestRail:

claude --print "
Convert these manual test cases to TestRail XML import format.
Include: title, steps, expected results, priority, references.

INPUT:
$(cat qa-artifacts/manual-tests-checkout-validation-20240113.md)

Output valid TestRail XML.
" > qa-artifacts/manual-tests-testrail-$(date +%Y%m%d).xml

For plain Markdown (GitHub/Notion):
The generated file is already in a clean format. Commit it directly and link it from the PR description.

Learning Tip: When reviewing AI-generated test cases, start with the HIGH-risk cases and spend 80% of your review time on them. The LOW-risk cases can be approved with a quick scan as long as they have the right structure. This mirrors risk-based testing principles — allocate review effort proportional to impact. If a LOW-risk test case is slightly imprecise, the consequences are minimal; if a HIGH-risk test case has a wrong expected result, you may approve a broken feature.


Traceability is what separates a well-maintained test suite from a pile of test cases. In the agentic loop, traceability is not an afterthought — it is built into the generation prompt from the start through AC references. But true traceability requires explicit, navigable links between: requirements → test plan scenarios → test cases → test execution results.

The Traceability Matrix

Build a traceability matrix as part of the coverage report at Stage 7, but set it up as a living document from the start of the loop:

Last updated: 2024-01-15

| AC ID | Test Plan Scenario | Manual Test Case(s) | Automated Test | Execution Status |
|-------|--------------------|---------------------|----------------|------------------|
| AC1   | TS-001, TS-002     | TC-TS-001, TC-TS-002| checkout.spec.ts#L12 | Passed |
| AC2   | TS-003             | TC-TS-003           | checkout.spec.ts#L34 | Passed |
| AC3   | TS-004             | TC-TS-004           | checkout.spec.ts#L56 | Failed — TS-004-BUG-001 |
| AC4   | TS-007, TS-008     | TC-TS-007, TC-TS-008| —               | Pending |
| AC5   | TS-005             | TC-TS-005           | checkout.spec.ts#L78 | Passed |
| AC6   | TS-003, TS-004     | (covered by above)  | checkout.spec.ts#L34 | Partial |
| AC7   | TS-009             | TC-TS-009           | —               | Pending |
| AC8   | TS-010, TS-011     | TC-TS-010, TC-TS-011| —               | Not Started |

This matrix is the artifact that the product owner, scrum master, and release manager look at when deciding whether the feature is ready to ship.

Generating the Traceability Matrix with AI

claude --print "
Generate a traceability matrix for this feature.

ACCEPTANCE CRITERIA:
---
$(cat /tmp/feature-spec.md)
---

TEST PLAN:
---
$(cat qa-artifacts/test-plan-checkout-validation-20240113.md)
---

MANUAL TEST CASES:
---
$(cat qa-artifacts/manual-tests-checkout-validation-20240113.md)
---

AUTOMATED TESTS (list of test file names and relevant line numbers):
---
tests/e2e/checkout-validation.spec.ts
tests/api/address-validation.test.ts
---

Generate a traceability matrix in markdown table format.
For each AC: list the test plan scenario IDs, manual test case IDs, and automated test file references.
Flag any ACs with no test coverage (no manual case AND no automated test).
Flag any ACs covered only by automated tests (no manual validation).
" > qa-artifacts/traceability-matrix-checkout-validation.md

Maintaining Traceability as Tests Change

The traceability matrix goes stale when:
- New ACs are added to the spec
- Test cases are deleted or merged
- Automated tests are refactored and file references change

Use this update prompt whenever the test suite changes:

claude --print "
Update this traceability matrix to reflect recent changes.

CURRENT MATRIX:
---
$(cat qa-artifacts/traceability-matrix-checkout-validation.md)
---

CHANGES SINCE LAST UPDATE:
- AC9 has been added: [describe AC9]
- TC-TS-003 was split into TC-TS-003a and TC-TS-003b
- checkout.spec.ts was refactored — line numbers have changed

Updated automated test locations:
$(grep -n 'describe\|test\|it(' tests/e2e/checkout-validation.spec.ts | head -50)

Output the updated traceability matrix with all changes reflected.
"

Embedding Traceability in Test Case Format

The most maintainable approach to traceability is to embed it directly in the test case files using a frontmatter block:

---
id: TC-TS-003
feature: checkout-address-validation
ac_refs: [AC2, AC6]
test_plan_scenario: TS-003
risk: HIGH
automated_equivalent: tests/e2e/checkout-validation.spec.ts#L34
last_updated: 2024-01-15
status: approved
---

### TC-TS-003: Checkout submit blocked when zip code is invalid
...

When every test case carries this metadata, generating the traceability matrix becomes a simple data extraction task rather than a manual cross-reference effort.

Learning Tip: Present the traceability matrix at your sprint review — not the list of test cases. Stakeholders cannot evaluate "we wrote 23 test cases." They can evaluate "we have test coverage for 8 of 9 ACs; AC8 (mobile viewport) is covered by planned exploratory testing." The matrix converts testing work into language that non-technical stakeholders can understand and act on, and it surfaces gaps that might otherwise be invisible until after the release.


How to trigger automatic test case updates when spec or code changes?

Stale test cases are a silent quality risk. When the spec or code changes, test cases that are not updated continue to execute against behavior that no longer exists — producing false passes that give the team false confidence. In the agentic loop, test case staleness is detected and remediated automatically.

The Staleness Detection Trigger

Set up a GitHub Action that detects spec and code changes and flags potentially stale test cases:

name: Agentic Test Case Staleness Check

on:
  pull_request:
    paths:
      - 'specs/**'          # Spec files
      - 'docs/acceptance-criteria/**'
      - 'src/**'            # Source code changes

jobs:
  staleness-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Setup Node
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install Claude Code
        run: npm install -g @anthropic-ai/claude-code

      - name: Get changed spec/source files
        id: changes
        run: |
          git diff origin/${{ github.base_ref }}...HEAD --name-only \
            | grep -E '(specs/|src/)' > changed-files.txt
          echo "Changed files:"
          cat changed-files.txt

      - name: Check for stale test cases
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          claude --print "
          You are a QA engineer checking whether recent changes have made any test cases stale.

          CHANGED FILES:
          $(cat changed-files.txt)

          CHANGED CONTENT:
          $(git diff origin/${{ github.base_ref }}...HEAD -- $(cat changed-files.txt | tr '\n' ' '))

          EXISTING TEST CASES:
          $(cat qa-artifacts/manual-tests-*.md 2>/dev/null || echo 'No manual test cases found')

          Your task:
          1. Identify test cases that reference behavior changed in the diff
          2. For each stale test case: describe what changed and how the test case needs updating
          3. Generate the updated test case content for each stale case
          4. Flag any new AC or behavior in the diff that has no existing test case

          Output: A staleness report followed by updated test case content.
          " > staleness-report.md

      - name: Post staleness report as PR comment
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            if (fs.existsSync('staleness-report.md')) {
              const report = fs.readFileSync('staleness-report.md', 'utf8');
              if (report.includes('STALE') || report.includes('stale') || report.includes('update')) {
                await github.rest.issues.createComment({
                  issue_number: context.issue.number,
                  owner: context.repo.owner,
                  repo: context.repo.repo,
                  body: `## Test Case Staleness Report\n\n${report}`
                });
              }
            }

The Update Prompt for Identified Stale Cases

When the staleness check identifies cases that need updating, the QA engineer runs the targeted update prompt:

STALE_CASE_ID="TC-TS-004"

claude --print "
Update the following test case based on the recent spec change.

STALE TEST CASE:
---
$(grep -A 40 "^### ${STALE_CASE_ID}:" qa-artifacts/manual-tests-checkout-validation-20240113.md)
---

SPEC CHANGE THAT MADE IT STALE:
---
AC4 has been changed from 'warning' to 'blocking error' for PO Box addresses.
The form must not submit when address is a PO Box.
PO Box detection should match: PO Box, P.O. Box, POB, post office box (case insensitive).
---

CODE CHANGE:
$(git diff HEAD~1 HEAD -- src/components/checkout/AddressValidator.ts)

Generate the updated test case with:
1. Updated steps reflecting the new blocking behavior (not a warning)
2. Updated pass/fail criteria
3. Additional edge case step: test each PO Box pattern variant
4. Preserved test case ID and AC reference (updated to reflect new behavior)
"

Proactive Spec Change Monitoring with Webhooks

For teams using Confluence or Notion for specs, set up a webhook that triggers the staleness check whenever a spec page is updated:

// webhook-handler.js (runs as a serverless function)
const { exec } = require('child_process');

exports.handler = async (event) => {
  const payload = JSON.parse(event.body);

  // Only trigger on spec/AC page updates
  if (!payload.page?.space?.key === 'SPECS') return { statusCode: 200 };

  const pageTitle = payload.page.title;
  const pageContent = payload.page.body?.storage?.value || '';

  // Trigger GitHub Actions workflow with spec content
  const response = await fetch(
    `https://api.github.com/repos/${process.env.REPO}/actions/workflows/test-case-staleness-check.yml/dispatches`,
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.GITHUB_TOKEN}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        ref: 'main',
        inputs: {
          spec_page: pageTitle,
          spec_content: pageContent.substring(0, 10000) // Trim for API limits
        }
      })
    }
  );

  return { statusCode: 200, body: 'Staleness check triggered' };
};

Learning Tip: Do not try to automate the test case update itself — only automate the detection of staleness and the generation of updated content as a draft. The final decision to accept, reject, or further modify a generated update should always be a human decision. The value of automation is eliminating the silent staleness problem where outdated test cases execute for weeks without anyone noticing — not replacing the judgment required to approve the updated content.