Generate & Maintain a Full Manual Test Suite

How to Generate a Complete Manual Test Suite from a User Story and Design Spec?

This topic is entirely practical. We will work through generating and maintaining a full manual test suite end-to-end, using a realistic feature as the subject. The feature is a user profile edit form — a common, moderately complex feature that demonstrates the full range of techniques from the previous topics.

The Feature: User Profile Edit

We will use this as our working example throughout this topic.

User Story:

As a registered user, I want to edit my profile information (display name, bio, profile photo, and notification preferences) so that my profile stays up to date and reflects how I want to be represented on the platform.

Acceptance Criteria:
- AC-01: User can update their display name (1–50 characters, alphanumeric and spaces only)
- AC-02: User can update their bio (0–500 characters, plain text, HTML stripped on save)
- AC-03: User can upload a new profile photo (JPEG/PNG only, max 5MB, min 100x100px resolution)
- AC-04: User can toggle email notifications on/off (per category: promotions, product updates, security alerts)
- AC-05: Security alerts notifications cannot be disabled (toggle is disabled)
- AC-06: Changes are saved only when the user clicks "Save Changes" — no auto-save
- AC-07: If the user navigates away with unsaved changes, a confirmation dialog is shown
- AC-08: Save is disabled (button grayed out) if no changes have been made since last save
- AC-09: A success notification appears for 5 seconds after successful save
- AC-10: Validation errors appear inline next to the affected field; the form does not submit

User Roles: Authenticated users only. Admin users see an additional "Account Status" section (not covered by these AC).

Field Constraints:
- Display name: required, 1–50 chars, regex: ^[A-Za-z0-9 ]+$
- Bio: optional, 0–500 chars, plain text
- Profile photo: JPEG/PNG only, max 5MB, min 100x100px, max 4000x4000px
- Notification toggles: boolean, security alerts locked to "on"

Step 1: Run the Full Context Extraction Prompt

Using the ordered prompt structure from Topic 2, assemble the full prompt:

Prompt:

## ROLE
You are a senior QA engineer with 10 years of experience testing SaaS web applications. You write thorough, executable manual test cases that other QA engineers can run without interpretation.

## TASK
Generate a complete set of manual test cases for the User Profile Edit feature described at the end of this prompt. Cover positive paths, negative paths, and edge cases with the coverage minimums defined below.

## COVERAGE REQUIREMENTS
- Positive paths: at least 5 test cases (primary save flow + role variations + partial edits)
- Negative paths: at least 1 test case per validation rule and per AC-defined constraint
- Edge cases: boundary values for all constrained fields, interrupted flows, concurrent edits, network errors

## OUTPUT FORMAT
For each test case:
**ID**: TC-[3-digit number starting at 001]
**Title**: [User Profile Edit] - [Scenario] - [Expected Outcome/Condition]
**Category**: [Positive | Negative | Edge Case]
**Priority**: [Critical | High | Medium | Low]
**Preconditions**: [Numbered list]
**Steps**: [Numbered list of single actions]
**Expected Result**: [Observable UI outcome — no internal state]
**AC Reference**: [AC-XX]

## ACCEPTANCE CRITERIA
AC-01 through AC-10: [as listed above]

## USER ROLES
- Authenticated user: can edit display name, bio, photo, notification preferences
- Admin user: same as above, plus sees Account Status section (out of scope for this story)
- Guest/unauthenticated: no access to profile edit page

## DATA CONSTRAINTS
- Display name: required, 1–50 chars, alphanumeric + spaces only
- Bio: optional, 0–500 chars, plain text
- Profile photo: JPEG/PNG, max 5MB, min 100x100px, max 4000x4000px
- Security alerts notification toggle: permanently ON, disabled UI

## DESIGN CONTEXT
- Form has four sections: Display Name (text input), Bio (textarea), Profile Photo (file upload with preview), Notification Preferences (3 toggle switches)
- "Save Changes" button: disabled when no changes; enabled when any field is changed
- Unsaved changes indicator: browser leaves-page confirmation dialog
- Inline validation errors appear below each affected field in red text
- Success notification: green banner at top of page, auto-dismisses after 5 seconds

## FEATURE DESCRIPTION
[Full user story and AC as above]

Reviewing the Initial Output

After generating, apply the review checklist from Topic 5. Key things to verify for this specific feature:

AC-05 (security alerts locked): Does the AI generate a test that verifies the Security Alerts toggle cannot be turned off? This is a UI-state test that AI sometimes misses because it describes a disabled control, not an action.
AC-07 (unsaved changes dialog): Does the AI generate multiple scenarios for this? (Navigate via link, close tab, hit browser back button, click another section in the nav.)
AC-08 (Save button disabled when no changes): Does the AI test the boundary condition — save is disabled when page loads, then enabled after a change, then what happens if you undo the change?
Boundary values for display name: Does the AI generate tests for 0 chars (empty), 1 char, 50 chars, 51 chars, and 50 chars with a special character?

Learning Tip: For any hands-on generation session, run the prompt, then spend the first 5 minutes specifically checking for the 3–4 AC items you know are most likely to be missed or misrepresented. Don't read the output start-to-finish first — do a targeted spot-check of the highest-risk items. If those look good, the rest will likely be fine. If those have issues, your context needs adjustment before you trust the rest of the output.

How to Use the Dual-Context Model (Spec + Code Change) for Test Generation?

The dual-context model is an advanced technique: instead of generating tests from the spec alone, you provide both the spec and the current code (or a diff) so the AI can identify discrepancies and generate tests that validate both what the spec says should happen and that the implementation actually implements it correctly.

When to Use the Dual-Context Model

Use this approach when:
- A feature is being modified or enhanced (not net-new)
- You have access to the code or diff (in a team with good dev-QA collaboration)
- You want to find spec-implementation gaps in addition to coverage gaps
- You're doing a targeted regression analysis after a PR

The Dual-Context Prompt Structure

Prompt:

You are a senior QA engineer. I'm providing you with both a feature specification and the corresponding code implementation. Use both to:
1. Identify any discrepancies between the spec and the implementation
2. Generate test cases that verify the specified behavior (spec-first tests)
3. Generate additional test cases that probe implementation details visible in the code that are not explicitly in the spec

## FEATURE SPECIFICATION
[PASTE FEATURE SPEC AND AC]

## CODE / DIFF
[PASTE RELEVANT CODE SECTIONS OR PR DIFF]

Analysis Output:

### Spec-Implementation Discrepancies
List any behavior the spec requires that appears to not be implemented, or behavior in the code that differs from the spec.

### Test Cases: Spec Validation
Test cases for each AC item — verify the specified behavior.

### Test Cases: Implementation-Aware Tests
Additional test cases based on code inspection — test edge cases visible in the implementation that aren't obvious from the spec alone.

Use this output format: [YOUR FORMAT]

Applying Dual-Context to the Profile Edit Feature

For our profile edit feature, suppose the dev team just submitted a PR that modifies the photo upload handler. You receive a diff showing:
- A new file-size check before upload
- A new image dimension validation function
- A change to the error message format for file type validation

The dual-context prompt for this PR would be:

Prompt:

[Above dual-context structure]

## FEATURE SPECIFICATION (relevant sections)
AC-03: User can upload a new profile photo (JPEG/PNG only, max 5MB, min 100x100px resolution)
AC-10: Validation errors appear inline next to the affected field; the form does not submit

## PR DIFF (summarized)
Changed:
- Added file size validation: rejects files > 5,242,880 bytes (exactly 5MB in bytes)
- Added dimension check: rejects images with width < 100px OR height < 100px
- Changed error message for file type: "Invalid file type. Please upload a JPEG or PNG image." (previously "File type not supported")
- Added: files exactly equal to 5,242,880 bytes are accepted (boundary inclusive)

Output:
1. Discrepancies between spec and code (if any)
2. Updated test cases for photo upload section
3. New test cases specific to the implementation details in this PR

From this, AI would generate tests like:
- Upload a file exactly 5,242,880 bytes (boundary — should accept per code)
- Upload a file of 5,242,881 bytes (just over — should reject)
- Upload a 100x100px image (boundary — should accept)
- Upload a 99x100px image (just below width boundary — should reject)
- Upload a 100x99px image (just below height boundary — should reject)
- Verify new error message copy for file type validation ("Invalid file type. Please upload a JPEG or PNG image.")

These specific boundary tests would be very hard to derive from the spec alone — they require the implementation details to be precise.

Identifying Spec-Implementation Discrepancies

The most valuable output of dual-context prompting is the discrepancy list. Common discrepancies AI finds:

Spec says max 5MB, code implements it as 5,000,000 bytes (not 5,242,880) — a common off-by-one on MB vs. MiB
Spec says "alphanumeric and spaces," code regex allows dashes — spec needs updating
Spec says "5 seconds" for notification, code implements 3 seconds — needs clarification
Spec says "HTML stripped," code only strips <script> tags — incomplete implementation

Each discrepancy is a finding to bring to the dev team before testing begins — not a bug to file, but an alignment conversation to have.

Learning Tip: The dual-context model is not about reading code as a QA engineer — it's about giving the AI both sources and letting it do the comparison. You don't need to understand the code deeply to benefit. Paste the diff, let the AI identify the discrepancies and generate the implementation-aware tests, and focus your expert judgment on whether the discrepancies are spec gaps or implementation bugs. This is a collaborative technique, not a code-review replacement.

How to Review, Edit, and Finalize AI-Generated Test Suite Output?

The finalization process transforms raw AI output into a production-ready test suite entry. This is the stage where your expert judgment is irreplaceable — no amount of better prompting eliminates the need for a human QA engineer to make final calls on what goes into the suite.

The Finalization Workflow

AI generates raw output
    ↓
Quick scan pass (5 min): flag obvious structural issues
    ↓
Coverage check pass (10 min): verify all AC items are covered
    ↓
Quality review pass (15–20 min): apply review checklist to flagged cases
    ↓
Edit pass: fix all identified issues directly
    ↓
Second opinion: run AI self-review on edited output
    ↓
Import to test management tool
    ↓
Tag as ai-generated + ai-pending-review
    ↓
Schedule peer review session

The Edit Pass: What to Fix and How

Fix 1: Unobservable expected results

Original: "Expected: The display name is updated in the database."
Fixed: "Expected: The profile page refreshes to show the new display name in the header area. The toast notification 'Changes saved successfully' appears at the top of the page and auto-dismisses after 5 seconds."

Fix 2: Vague preconditions

Original: "Preconditions: User is logged in and on the profile page."
Fixed: "Preconditions: 1. A registered user account exists with display name 'John Doe' and no bio. 2. User is authenticated. 3. User has navigated to /profile/edit. 4. The 'Save Changes' button is currently disabled (no changes made)."

Fix 3: Multi-action steps

Original: "Step 1: Fill in display name, bio, and click Save Changes."
Fixed:
"Step 1: Click on the Display Name input field.
Step 2: Clear the existing value.
Step 3: Type 'Jane Smith' in the Display Name field.
Step 4: Click on the Bio textarea.
Step 5: Type 'Software engineer and coffee enthusiast.' in the Bio field.
Step 6: Click the 'Save Changes' button."

Fix 4: Priority calibration

AI tends to mark too many tests as "High" or "Critical." Apply this calibration:
- Critical: tests that, if failed, would block release (login, core create/update/delete, payment)
- High: tests for primary feature flows and AC-defined behaviors
- Medium: alternate paths, secondary features, non-critical validations
- Low: cosmetic/UI state tests, edge cases with very low probability

Using AI for the Second Opinion Pass

After your edit pass, run the edited output back through AI for validation:

Prompt:

Review the following edited test cases for residual issues. I've already done a manual review pass. Look specifically for:
1. Any expected results that still describe internal state rather than observable UI outcomes
2. Any preconditions that are vague or incomplete
3. Any steps that could be interpreted ambiguously by a tester executing this without additional context
4. Any test case where the expected result would pass even on a broken implementation
5. Any AC items from the list below that appear to have no test case coverage

AC items: [LIST AC IDs]

Edited test cases:
[PASTE EDITED TEST CASES]

Output: only issues found (if none, say "No issues found"). For each issue: test case ID, issue type, specific problem, recommended fix.

Import and Organization

Once finalized, structure your test cases into the test management tool hierarchy:

User Profile Edit
├── Happy Paths
│   ├── TC-001: Save all fields successfully
│   ├── TC-002: Save display name only
│   ├── TC-003: Save bio only
│   ├── TC-004: Save photo only
│   └── TC-005: Save notification preferences
├── Negative Paths
│   ├── Display Name Validation
│   ├── Bio Validation
│   ├── Photo Upload Validation
│   └── Navigation/State Validation
└── Edge Cases
    ├── Boundary Values
    ├── Interrupted Flows
    └── Concurrent Edits

Learning Tip: The finalization workflow feels longer than just "reviewing" output, but each step takes less time than it sounds. A 30-test suite can typically be finalized in 45–60 minutes end-to-end. Compare this to writing 30 test cases from scratch, which for most QA engineers takes 3–4 hours. Even with a thorough finalization process, AI-assisted test generation is 3–4x faster than pure manual authoring — and the quality ceiling is just as high, because your review and editing expertise sets the floor.

How to Update an Existing AI-Generated Test Suite After a Code Change?

Maintaining AI-generated test cases is fundamentally different from maintaining hand-written ones because you have a record of what generated them (the prompt and context) and you can use AI to do the maintenance. The key is to treat maintenance as a re-generation from updated context, not as manual line-by-line editing.

The Maintenance-as-Re-generation Principle

When a code change affects your test suite, the most effective approach is:
1. Update your source context (AC, design spec, field constraints)
2. Run the change impact analysis to identify which tests are affected
3. For each affected test, re-generate from updated context rather than manually patching

This keeps your test cases consistent with each other and with the current spec, rather than accumulating ad-hoc patches that diverge from the original generation style.

Scenario: AC-03 Changes to Allow GIF Uploads

Suppose after sprint review, the product team decides to also allow animated GIFs for profile photos, and increases the max file size to 10MB. AC-03 is updated to:

AC-03 (updated): User can upload a new profile photo (JPEG/PNG/GIF only, max 10MB, min 100x100px resolution). Animated GIFs are supported; only the first frame is used as the static preview.

Step 1: Run change impact analysis:

Prompt:

AC-03 has been updated. Identify all test cases that are affected by this change.

Updated AC-03: User can upload a new profile photo (JPEG/PNG/GIF only, max 10MB, min 100x100px resolution). Animated GIFs are supported; only the first frame is used as the static preview.

Previous AC-03: User can upload a new profile photo (JPEG/PNG only, max 5MB, min 100x100px resolution)

Test cases related to profile photo upload:
[PASTE PHOTO UPLOAD TEST CASES]

For each affected test case:
- Change type: Update Required | Deprecated (removed behavior) | Keep as-is
- What specifically needs to change

Step 2: Re-generate affected test cases:

Prompt:

Re-generate test cases for the profile photo upload feature based on the updated AC. Use the same format and naming convention as the existing test cases provided.

Updated AC-03: [PASTE UPDATED AC]
Additional context: GIF files' first frame is used as static preview; animated preview is not shown.

Existing test cases (for format reference):
[PASTE 2-3 EXISTING PHOTO UPLOAD TEST CASES AS FORMAT EXAMPLES]

Generate:
1. Updated versions of existing test cases that are no longer valid (mark: REPLACES TC-XXX)
2. New test cases for behaviors added by this change (GIF support, new file size limits)
3. Any test cases that were not affected (confirm they remain valid)

Step 3: Produce a changelog entry:

Prompt:

Produce a changelog entry for the following test suite update:

Sprint: [Sprint number]
Change trigger: AC-03 updated (profile photo: added GIF support, increased max file size 5MB → 10MB)

Updated test cases: [LIST]
Added test cases: [LIST]
Deprecated test cases: [LIST]

Format: dated changelog entry with summary of changes and rationale.

Batch Update Workflow for Sprint-End Maintenance

At the end of each sprint, run a systematic maintenance cycle across all affected test areas:

Prompt:

Sprint [N] test suite maintenance cycle.

Stories completed this sprint that affected existing features:
[LIST STORIES WITH BRIEF CHANGE DESCRIPTIONS]

For each story, I need to:
1. Identify which test cases need updating
2. Identify which test cases are now deprecated
3. Identify new behaviors that need new test cases

Test suite sections to analyze (by feature area):
[PASTE RELEVANT TEST SUITE SECTIONS]

Output a sprint maintenance report:
- Per story: affected test cases + recommended actions
- New test cases needed (brief description of each, I'll generate in detail separately)
- Test cases to deprecate
- Estimated maintenance effort (low/medium/high per story)

Preventing Maintenance Debt

Maintenance debt — test cases that are out of date with the current implementation — is the most common quality problem in manual test suites. The tactics that prevent it:

Run impact analysis before every regression cycle, not after bugs are found
Update source documents (AC, constraints) before updating test cases — if your AC is stale, your updated tests will be wrong
Never manually patch test cases piecemeal — re-generate from updated context for consistency
Quarterly deduplication and archive runs — keep the suite lean and current
Traceability as a forcing function — a test case linked to a deprecated AC will surface in your traceability check and force a decision

Learning Tip: The biggest mistake in test suite maintenance is treating it as a reactive task — you update tests after bugs are found or after someone reports the tests are wrong. Maintenance should be proactive: a regular sprint ceremony item, not a fire to fight. Teams that schedule 30 minutes of suite maintenance at the end of every sprint never accumulate the kind of stale test debt that requires a multi-sprint "test suite cleanup" project. Invest in the process, not the cleanup.