This capstone brings together every technique from the course into a single end-to-end workflow. The feature used in this exercise is deliberately realistic: it has ambiguous acceptance criteria, backend and frontend components, a non-trivial data model, and a security-sensitive flow. Working through it will expose every workflow pattern covered in this course in one continuous session.
The capstone is structured as a complete walkthrough you can follow step-by-step, adapting each prompt with your own feature's content. The goal isn't to memorize the workflow — it's to internalize the pattern so you can reproduce it on any feature from any module in any product.
How to Run the Full Agentic QA Lifecycle for a Realistic Feature from Start to Finish?
The Feature: Subscription Plan Upgrade Flow
The feature we'll QA throughout this capstone is a subscription plan upgrade flow in a B2B SaaS product. This is a realistic choice because:
- It has both frontend (plan selection UI) and backend (billing API, state machine) components
- It has business rule complexity (proration, enterprise exceptions, trial handling)
- It has security sensitivity (payment data, account access control)
- It has mobile and web surfaces
- It connects to an external payment provider (Stripe)
Feature summary:
Users on Free and Starter plans can upgrade to Pro or Enterprise. The upgrade involves a proration calculation, immediate billing, and a state transition in the account system. Enterprise upgrades require approval from a sales team member before billing. Free-to-Pro upgrades bypass the sales approval step.
User story:
As a Starter plan user, I want to upgrade to Pro so that I can access advanced reporting features.
Acceptance Criteria:
1. The upgrade flow is accessible from Settings > Billing.
2. The user sees the monthly and annual pricing for Pro.
3. Selecting a plan shows a proration calculation for the remaining billing period.
4. Confirming the upgrade immediately bills the stored payment method.
5. On successful payment, the account state transitions to Pro within 5 seconds.
6. On payment failure, the user sees an error message and the account remains on Starter.
7. Enterprise plan upgrades route to a "Contact Sales" flow instead of direct billing.
The Five-Stage Lifecycle Overview
The full agentic QA lifecycle runs in five stages:
- Plan — testability assessment, risk analysis, test strategy, scope definition
- Generate — manual test cases, E2E automation, API tests, exploratory charters
- Execute — run tests, track results, investigate failures
- Analyze — defect triage, coverage analysis, failure root cause
- Report — test execution report, go/no-go recommendation, coverage summary
Each stage has a primary AI prompt and a verification step. The workflow is linear for first-time use; experienced practitioners will run stages in parallel or iterate between them.
Stage 1: Pre-Sprint Testability Assessment
Before the sprint starts, run a testability assessment on the story:
You are a senior QA engineer. Assess the following user story for testability before sprint planning.
USER STORY AND AC:
[paste the subscription upgrade story above]
SYSTEM CONTEXT:
- B2B SaaS product, React frontend, Node.js/Express backend, PostgreSQL
- Payment processing via Stripe API (external)
- Account state machine: Free → Starter → Pro → Enterprise
- Test environment: staging with Stripe test mode enabled
- Mobile apps: iOS and Android (React Native)
ASSESS:
1. OBSERVABILITY: Are outcomes measurable? Flag any subjective or ambiguous AC.
2. ERROR STATE COVERAGE: Are all failure modes defined? List missing error states.
3. DATA PRECONDITIONS: What test accounts, states, and configurations are needed?
4. SCOPE BOUNDARIES: Is the scope clearly bounded from related features?
5. DEPENDENCY RISKS: External services, feature flags, cross-team dependencies?
6. SECURITY/COMPLIANCE FLAGS: Any AC involving payment data, PII, or access control?
Rate each dimension: READY / NEEDS CLARIFICATION / BLOCKED
Output a prioritized list of questions to raise in grooming.
What you'll find: The testability assessment will surface several issues in this story:
- AC #3 ("shows a proration calculation") doesn't specify the formula or where it's sourced — is this the Stripe proration or a custom calculation?
- AC #5 ("within 5 seconds") is a performance criterion with no test methodology defined
- Enterprise upgrade flow (AC #7) is mentioned but not detailed — what does "Contact Sales" flow look like?
- No AC covers what happens when a user has an expired payment method
These are exactly the gaps you want to surface before the sprint starts, not after development is complete.
Learning Tip: Save your testability assessment outputs as a record. When a production bug is found later that traces back to one of the gaps flagged here, you have documented evidence that the risk was identified and either accepted or missed in planning. This is the QA paper trail that protects the team and enables retrospective learning.
How to Apply the Dual-Context Model Across Planning, Generation, and Execution?
The Dual-Context Model in Practice
The dual-context model — spec context + code-change context — shapes how you structure every prompt in the lifecycle. The two contexts serve different purposes:
- Spec context defines what the feature is supposed to do — user stories, AC, design specs, business rules
- Code-change context defines what actually changed in the code — the PR diff, changed files, new endpoints, modified state transitions
Neither context alone is sufficient:
- Spec context without code context produces tests for the intended behavior but misses implementation-specific risks (a new code path that wasn't in the original design)
- Code context without spec context produces tests for what was built but misses whether what was built matches what was specified
Building the Spec Context Block
For the subscription upgrade feature, your spec context block looks like this:
## SPEC CONTEXT: Subscription Plan Upgrade Flow
**Feature summary:** Users upgrade subscription plans from Settings > Billing. Free/Starter users upgrade to Pro via direct billing. Enterprise requires sales approval.
**Business rules (non-obvious):**
- Proration uses Stripe's `proration_behavior: 'create_prorations'` — test against Stripe's calculation, not a custom formula
- Account state transition is asynchronous: the backend fires a webhook after Stripe confirms payment; the 5-second SLA is measured from payment confirmation to webhook processing
- Enterprise upgrades create a `sales_inquiry` record and route to CRM, not to billing
- Free accounts that never added a payment method cannot upgrade until a card is added
**Excluded from this story:**
- Downgrade flow (separate story)
- Coupon/promo code application (separate story)
- Payment method management (separate story)
**AC traceability:**
AC1: Settings > Billing page navigation
AC2: Pricing display
AC3: Proration calculation display
AC4: Payment execution
AC5: Account state transition
AC6: Payment failure handling
AC7: Enterprise route
Building the Code-Change Context Block
When the PR is ready, generate the code-change context:
You are a QA engineer analyzing a code change for test impact.
PR TITLE: Add subscription upgrade flow
PR DESCRIPTION: [paste PR description]
CHANGED FILES:
[paste the list of changed files from the PR diff]
DIFF SUMMARY:
[paste the key changes — new endpoints, modified functions, new database fields, changed constants]
GENERATE:
1. A code-change context block summarizing what was changed in QA-relevant terms (what behavior changed, what new code paths were introduced, what edge cases the code handles explicitly vs. implicitly)
2. A list of code-level risk areas (places in the implementation that look like they could behave unexpectedly in testing)
3. Any code-level details that contradict or extend the spec context above
Combining Contexts in Generation Prompts
In every generation prompt during this lifecycle, include both context blocks:
[SPEC CONTEXT BLOCK — paste above]
[CODE-CHANGE CONTEXT BLOCK — paste above]
Now generate [test cases / E2E script / API tests / exploratory charter] with full awareness of both the intended behavior and the actual implementation.
This combined approach produces the highest-quality AI test output because the model can cross-reference what the feature should do against what the code actually does — exactly what a thorough human reviewer would do.
Learning Tip: The code-change context block is most valuable in the two to three days after a PR is merged, when the implementation details are stable but you haven't yet started writing tests. Running a dual-context generation at this point — before any manual testing — is consistently the highest-leverage use of the model. It surfaces cases the implementation handles differently from the spec while you still have time to raise them before acceptance testing.
How to Generate Manual Tests, Automated E2E/API Tests, and Exploratory Charters in One Workflow?
The Generation Sequence
Generate in this order for maximum efficiency:
- Manual test cases first — these define what needs to be verified and become the source of truth for automation scoping
- API tests second — fastest to generate and validate; API behavior is usually the most clearly defined in the code context
- E2E tests third — built on top of the verified API behavior and manual test case structure
- Exploratory charters last — risk-based charters targeted at the areas your scripted tests don't reach
Step 1: Generate Manual Test Cases
You are a senior QA engineer generating a complete manual test suite for the subscription upgrade flow.
[SPEC CONTEXT BLOCK]
[CODE-CHANGE CONTEXT BLOCK]
Generate a complete manual test suite organized by category:
CATEGORY 1: Happy Path
- Free to Pro upgrade (user with saved payment method)
- Starter to Pro upgrade (user with saved payment method)
- Annual plan selection vs. monthly
- Proration display accuracy (spot-check against Stripe test mode)
CATEGORY 2: Payment Failure Paths
- Declined card during upgrade
- Insufficient funds response
- Expired card on file
- 3DS challenge required (if applicable in your Stripe config)
CATEGORY 3: Enterprise Route
- Enterprise plan selection routes to Contact Sales
- Sales inquiry record created correctly
- No billing occurs for Enterprise upgrade selection
CATEGORY 4: Edge Cases
- User with no saved payment method
- User who is already on Pro (no upgrade needed)
- User mid-billing-cycle (proration accuracy)
- Concurrent upgrade attempt from two browser tabs
CATEGORY 5: State Consistency
- Account state after successful upgrade
- Account state after failed upgrade
- State visible in Settings before webhook processes (intermediate state handling)
For each test case: Title | Preconditions | Steps | Expected Result | Test Data Notes
Note which cases are candidates for automation (stable, deterministic, repeatable) vs. manual-only.
Step 2: Generate API Tests
You are a QA automation engineer generating API test cases for the subscription upgrade backend.
[SPEC CONTEXT BLOCK]
[CODE-CHANGE CONTEXT BLOCK]
Available API endpoints (from the PR diff):
POST /api/v1/subscriptions/upgrade
- Body: { plan_id, billing_period, promo_code (optional) }
- Returns: { upgrade_id, status, proration_amount, scheduled_at }
GET /api/v1/subscriptions/proration
- Query: plan_id, billing_period
- Returns: { proration_amount, next_billing_date, current_plan_end }
GET /api/v1/subscriptions/status
- Returns: { plan, status, features, next_billing_date }
Generate API tests using our Supertest/Jest framework structure:
For POST /api/v1/subscriptions/upgrade:
- 200: Successful upgrade with valid payload
- 200: Upgrade with annual billing period
- 402: Payment failure (use Stripe test card 4000000000000002)
- 400: Missing required field (plan_id)
- 400: Invalid plan_id
- 403: User is not authorized to upgrade (e.g., already on Pro)
- 422: Enterprise plan selected (should return enterprise_inquiry_created status)
- 401: Unauthenticated request
- 409: Concurrent upgrade conflict
For GET /api/v1/subscriptions/proration:
- Correct proration for mid-cycle upgrade
- Annual vs. monthly calculation
- Response for Free plan upgrade (no existing paid period)
FRAMEWORK CONTEXT:
[paste your test file structure, helper functions, Stripe test mode tokens]
Step 3: Generate E2E Tests
You are a QA automation engineer generating Playwright E2E tests for the subscription upgrade flow.
[SPEC CONTEXT BLOCK]
[CODE-CHANGE CONTEXT BLOCK]
CODEBASE CONTEXT:
[paste your Page Object Model structure, existing selectors for Settings page, test data factories]
Generate Playwright tests for:
TEST FILE: subscription-upgrade.spec.ts
Flows to cover:
1. Full Starter-to-Pro upgrade: navigate → select Pro → view proration → confirm → verify state transition
2. Payment failure: same flow with Stripe test decline card → verify error display → verify state unchanged
3. Enterprise route: select Enterprise → verify redirect to Contact Sales → no billing step shown
For each test:
- Use Page Object Model pattern (provide my existing POM structure as context)
- Use test data factories for account setup
- Include Stripe test card constants (success: 4242..., decline: 4000...0002)
- Add explicit wait for the account state transition (async webhook) with a 5-second poll
- Include assertions for both UI state and API state (call GET /api/v1/subscriptions/status after each flow)
Step 4: Generate Exploratory Charters
You are a QA lead generating risk-based exploratory testing charters for the subscription upgrade flow.
[SPEC CONTEXT BLOCK]
[CODE-CHANGE CONTEXT BLOCK]
SCRIPTED TEST COVERAGE ALREADY PLANNED:
[paste a brief summary of what the manual and automated tests cover]
Generate 4 exploratory testing charters targeting areas NOT well-covered by scripted tests:
CHARTER FORMAT:
- Target: What area or question to explore
- Risk hypothesis: What could go wrong here
- Exploration approach: Specific techniques, tools, or paths to follow
- Time box: Recommended duration
- Output: What to produce from the session
Focus charter areas on:
1. The async state transition behavior and edge timing scenarios
2. Session and tab management during an in-progress upgrade
3. Mobile-specific behavior (React Native upgrade flow)
4. Error recovery and retry scenarios after a failed upgrade attempt
Learning Tip: Run the four generation steps as a single AI session with all context loaded once at the start. Don't open a new chat for each step — the continuity of context means later generations (E2E tests, exploratory charters) build accurately on the decisions made in earlier ones (manual test scope, API test coverage). A session with all four generation tasks in sequence typically runs 45–60 minutes and produces a test suite that would take two to three days to write manually. The editing and verification time is the new constraint — not the generation time.
What Does the Agentic QA Workflow Catch Automatically vs. Where Does Human Judgment Remain Essential?
What the Agentic Workflow Catches Automatically
After completing this capstone, you've seen what the agentic workflow handles well:
1. Coverage completeness across the AC surface
AI systematically maps every acceptance criterion to test cases. Human engineers doing this manually often miss 15–20% of AC items — not because they're careless, but because holding all AC in mental context while writing test cases is cognitively expensive. AI doesn't tire.
2. Negative path and edge case generation
AI generates unhappy path scenarios at a volume and consistency that's hard to match manually. Given a clear input domain, it will produce boundary cases, error state variations, and null/empty inputs that human engineers routinely deprioritize in favor of happy path coverage.
3. Test code structuring and boilerplate
All the scaffolding work — file structure, test setup/teardown, assertion syntax, data factory calls — is generated accurately when codebase context is provided. This is purely mechanical work that AI handles faster than humans.
4. Documentation synthesis
Generating first drafts of test plans, execution reports, and knowledge base entries from raw data. AI handles the structuring and formatting; humans provide the quality judgment.
5. Cross-referencing and gap detection
AI is excellent at taking a set of test cases and a set of requirements and identifying which requirements have no test coverage. This is a pattern-matching task humans find tedious and error-prone at scale.
Where Human Judgment Remains Essential
1. Verifying assertions against real behavior
AI generates assertions based on the spec and the code it can see. It cannot run the application. Every assertion must be verified against a real test run before the test can be trusted. This verification step is human work.
In the subscription upgrade capstone: AI will generate expect(response.body.status).toBe('active') — but the actual Stripe webhook might set state: 'pro' rather than status: 'active'. Only running the test reveals this.
2. Assessing context-specific risk
AI risk scoring is based on patterns from training data and the context you provide. It doesn't know that your Stripe integration was rewritten by a contractor last quarter and has undocumented edge cases. It doesn't know that the billing team has an open incident with their webhook delivery reliability. Human judgment about team-specific risk factors is irreplaceable.
3. Distinguishing product intent from implementation bugs
When a test fails, someone must decide: is this a bug in the product or a bug in the test? AI can surface hypotheses, but the judgment call requires understanding of product intent that only the QA engineer and product team share.
In the capstone: if the concurrent upgrade test fails with a 500 error instead of a 409, is that a bug (missing conflict detection) or correct behavior that our test expectation got wrong? That judgment is human.
4. Exploratory intuition and creative hypothesis generation
AI-generated exploratory charters are starting points. The actual session depends on what you notice as you explore — the slightly off animation, the slightly wrong number in the proration calculation, the suspiciously fast response for a complex operation. These observations are driven by human intuition built from experience. AI can frame the charter; it cannot replicate the curiosity and pattern-recognition of an experienced tester in a live session.
5. Business context and priority decisions
Which bugs to escalate, which to defer, which test failures constitute a release blocker — all of these require business context, stakeholder relationships, and delivery judgment that no AI model currently has access to. The go/no-go decision is always human.
6. Ethical and regulatory judgment
In the subscription upgrade flow: is the payment error messaging compliant with our region's consumer protection regulations? Does the proration disclosure meet GDPR requirements? Is the data we're capturing and retaining from failed upgrade attempts handled appropriately? AI can help flag known compliance patterns, but responsibility for these judgments is human.
The Integrated Human-AI Partnership Model
The agentic QA lifecycle is not a handoff from human to AI — it's a continuous partnership where each brings what they're best at:
| Stage | AI Contribution | Human Contribution |
|---|---|---|
| Planning | Testability gaps, risk identification, scope draft | Product context, priority judgment, risk calibration |
| Generation | Coverage breadth, boilerplate, variant generation | Domain accuracy review, assertion verification, scope decisions |
| Execution | Running tests, logging results | Investigating failures, exploratory testing, judgment calls |
| Analysis | Pattern recognition, report drafting, trend identification | Root cause judgment, business impact assessment, fix prioritization |
| Reporting | Document structuring, narrative generation, formatting | Go/no-go decision, stakeholder communication, accountability |
Running Your Own Capstone
To apply this workflow to a feature in your own product:
- Select a feature that touches multiple components (frontend + backend), has business rule complexity, and has at least 5 AC items
- Build your context blocks — spec context from the story and design docs, code context from the PR diff
- Run the testability assessment before the sprint starts
- Execute the generation sequence in one session: manual → API → E2E → exploratory charters
- Execute and verify — run the AI-generated tests, verify every assertion against real behavior, document what needed correction
- Analyze and report — generate the test execution report, produce a go/no-go recommendation
- Retrospect — record which AI outputs were used directly, which needed significant revision, and what you'll prompt differently next time
The retrospective step is the most important one for long-term improvement. After five features, your prompt refinements will be calibrated to your exact product, stack, and workflow — and your AI output quality will be substantially better than when you started.
Learning Tip: The best measure of your agentic QA maturity isn't how much AI output you use — it's how precisely you can predict where AI will fail before you run the prompt. When you know exactly which assertions to verify, which edge cases the model will miss, and which business rules you need to inject explicitly, you've internalized the human-AI boundary for your context. That predictive accuracy is the compound interest on everything you've learned in this course — and it's the foundation of being a QA engineer who doesn't just use AI tools, but shapes how a team uses them.