Hands-On: Build a Complete AI-Generated Test Suite

A test suite is only complete when it can catch the failure you did not anticipate — this capstone builds that suite from strategy to mutation score for a real billing service.

The Scenario: Subscription Billing Service

This capstone exercise walks through building a complete test suite for a subscription billing service. You will apply everything from this module: risk-based test strategy, unit tests with meaningful assertions, integration tests using real infrastructure, contract tests for external APIs, TDD for one key function, and mutation testing to verify the suite's strength.

The billing service has the following responsibilities: managing subscription plans and customer subscriptions, running billing cycles (charging customers at the end of each billing period), handling upgrades and downgrades with prorated adjustments, integrating with a payment provider (Stripe-like API), and emitting events for billing outcomes (success, failure, retry).

This is a realistic scope. Billing logic is one of the highest-risk areas in any commercial software system. Bugs in this service cause real financial harm — customers are overcharged, undercharged, or incorrectly suspended. The test suite for this service must be rigorous.

Learning tip: Read the full spec and the full capstone before writing a single test. Understanding where the exercise is going helps you make better decisions at each step.

Step 1: Risk Analysis and Test Strategy

Before writing any tests, build the test strategy. This grounds every subsequent decision.

I am building a test suite for a subscription billing service. Here is the service specification:

PLANS: Basic ($9.99/mo), Pro ($29.99/mo), Enterprise ($99.99/mo)

BILLING CYCLE: On the customer's billing anniversary date, charge their payment method the plan price. If payment fails, retry 3 times over 72 hours, then suspend the subscription.

UPGRADES: When a customer upgrades, charge the prorated amount for the remainder of the current billing period at the new plan price, minus the prorated value of the current plan for the same period.

DOWNGRADES: Take effect at the next billing period, no charge or refund in the current period.

PAYMENT PROVIDER API: POST /charges (create charge), GET /charges/:id (get charge status), POST /refunds (issue refund).

EVENTS EMITTED: subscription.charged, subscription.charge_failed, subscription.suspended, subscription.upgraded, subscription.downgraded.

Perform a risk analysis:
1. Identify the top 7 riskiest behaviors (where a bug would cause financial harm or user-facing failure)
2. For each risk: assign probability (low/medium/high), impact (low/medium/high), and detectability (would we notice immediately or only after complaints?)
3. Recommend a test layer (unit/integration/contract/e2e) for each risk
4. Output a test plan document with this structure

Expected output: A prioritized test plan with 7 risk items, each with probability/impact/detectability scores and recommended test layer. High-impact risks likely include proration calculation, payment retry logic, and billing cycle idempotency.

Step 2: Unit Tests for Pricing and Proration Logic

The core financial calculations belong in pure functions with no dependencies. These are the highest-priority unit tests.

Write comprehensive unit tests for these two pure functions in TypeScript + Vitest:

1. calculateUpgradeCharge(currentPlan, newPlan, upgradeDate, billingAnchorDate): number
   - Returns the net charge for upgrading: prorated new plan amount minus prorated current plan credit
   - billingAnchorDate is the day of month the subscription renews
   - upgradeDate is the date the upgrade is requested

2. calculateRetrySchedule(firstFailureDate: Date, maxRetries: number, retryIntervalHours: number): Date[]
   - Returns an array of retry timestamps
   - maxRetries = 3, retryIntervalHours = 24 by default

For each function, write tests for:
- Happy path with specific numeric assertions (not just "returns a number")
- Boundary conditions (upgrade on the first day of billing period, upgrade on the last day)
- Invalid inputs that should throw typed errors
- Edge cases specific to billing (upgrade from a more expensive to a less expensive plan, February billing dates, leap years)

Above each test, add a comment: // Catches: [specific defect this test would detect]

Expected output: 15-25 unit tests across the two functions, each with specific value assertions and defect-catching annotations.

Step 3: TDD for the Billing Cycle Idempotency Check

The billing cycle must be idempotent — running it twice for the same subscription and period must not result in a double charge. This is a critical safety property. Use TDD to drive its implementation.

Write the first failing test:

I am doing TDD. Here is the first failing test for a billing cycle idempotency guard. The guard should prevent double-charging a customer if the billing job runs twice.

Do NOT implement the solution. Confirm you understand the interface and what "idempotency" means in this context:

test('does not charge a subscription that was already billed this period', async () => {
  const billingService = new BillingService({ paymentProvider: mockPaymentProvider });

  const subscription = createTestSubscription({ 
    planId: 'pro', 
    billingAnchorDay: 1,
    lastBilledAt: new Date('2024-02-01')
  });

  // First billing run
  await billingService.runBillingCycle(subscription, new Date('2024-03-01'));

  // Second billing run for the same period
  await billingService.runBillingCycle(subscription, new Date('2024-03-01'));

  // Should only have charged once
  expect(mockPaymentProvider.charge).toHaveBeenCalledTimes(1);
});

Expected output: AI confirms understanding of idempotency and the interface — runBillingCycle should check whether lastBilledAt already covers the current billing period before charging.

Drive the implementation:

Here are two tests for billing cycle idempotency. Implement the minimum BillingService class to make both pass:

[paste both tests]

Requirements:
- Use a real idempotency check based on lastBilledAt date, not a flag
- The check must compare billing periods, not exact timestamps (billing on March 1 at 9am and 9pm should both be idempotent for the March period)
- Store the new lastBilledAt on the subscription after a successful charge
- Do not hardcode values — the check must work for any billing anchor day and any month

Expected output: BillingService class with real period-comparison logic.

Step 4: Integration Tests for the Billing Cycle

Integration tests verify the full stack: billing service, subscription repository, and the real (or realistically stubbed) payment provider.

Write integration tests for the BillingService billing cycle in TypeScript + Vitest.

Setup:
- Use a real in-memory SQLite database (not mocked) for subscription storage
- Use a real HTTP mock server (msw or nock) for the payment provider API — intercept real HTTP calls, do not mock the HTTP client
- Use real date manipulation (no fake timers initially, add them only where specified)

Tests to write:
1. Successful billing cycle: subscription is charged, lastBilledAt is updated, subscription.charged event is emitted
2. Payment failure with retry: first charge fails (payment provider returns 402), second attempt (simulated via retrySchedule) succeeds
3. Suspension after max retries: all 3 retries fail, subscription status changes to "suspended", subscription.suspended event is emitted
4. Idempotency: running billing cycle twice in the same period only creates one charge record in the database
5. Concurrent billing cycle runs: two simultaneous invocations for the same subscription do not result in two charges (test with Promise.all)

For each test:
- Show the database fixture setup
- Show the HTTP mock setup for the payment provider
- Assert on: the HTTP calls made, the database state after, and the events emitted
- Do NOT mock the billing service internals

Expected output: 5 integration tests with real database and HTTP mocking, asserting on full system state.

Step 5: Contract Tests for the Payment Provider API

The payment provider API is a critical external dependency. Contract tests verify your service uses it correctly and will catch breaking changes from the provider.

Write Pact consumer contract tests for our payment provider API integration in TypeScript.

Our service makes these calls:
1. POST /charges - body: { customerId, amount, currency, idempotencyKey, metadata }
   - Success: 200, body: { id, status: "succeeded" | "pending", amount }
   - Insufficient funds: 402, body: { error: "insufficient_funds" }
   - Invalid customer: 404, body: { error: "customer_not_found" }

2. POST /refunds - body: { chargeId, amount, reason }
   - Success: 200, body: { id, chargeId, amount, status: "succeeded" }

Generate:
1. The Pact consumer test file with all interactions defined
2. The provider verification test scaffold
3. A pactflow.io publish configuration for CI

Use @pact-foundation/pact v12. Each interaction must specify exact request matchers and response body matchers (use Matchers.like() and Matchers.term() appropriately — not exact string matching for IDs).

Expected output: Consumer test file, provider verification scaffold, and CI configuration for publishing pacts.

Step 6: Property-Based Tests for Proration

The proration calculation needs property-based coverage because the number of edge cases (day-of-month combinations, plan price combinations, February, leap years) is too large for example-based tests to cover adequately.

Write fast-check property-based tests in Vitest for the proration calculation used in plan upgrades.

The calculation: 
- prorationAmount = (newPlanDailyRate - currentPlanDailyRate) * remainingDaysInPeriod
- dailyRate = monthlyPrice / daysInBillingMonth

Properties to test:
1. Upgrading and then immediately downgrading back should net to approximately zero (within floating point tolerance of $0.01)
2. Upgrading on day 1 should charge approximately the full price difference
3. Upgrading on the last day of the period should charge approximately 1 day's worth of price difference
4. Proration amount is always less than or equal to the price difference between the two plans
5. Proration amount is always greater than or equal to zero for upgrades (more expensive plan)
6. Proration amount is always less than or equal to zero for downgrades (less expensive plan) — downgrades should produce a credit, not a charge

For each property, define:
- The arbitraries (plan prices, day of month, days in billing period)
- The property assertion
- A tolerance value for floating-point comparisons where applicable

Use fc.integer(), fc.float(), and fc.record() from fast-check.

Expected output: 6 property tests with appropriate arbitraries and floating-point tolerances.

Step 7: Run Mutation Testing and Interpret Results

After the full test suite is in place, mutation testing evaluates its strength.

I ran Stryker mutation testing on the billing service and got a mutation score of 68%. Here are the 12 surviving mutations:

1. calculateUpgradeCharge: `remainingDays = endOfPeriod - upgradeDate + 1` → `remainingDays = endOfPeriod - upgradeDate`
2. calculateUpgradeCharge: `newDailyRate - currentDailyRate` → `newDailyRate + currentDailyRate`
3. BillingService.runBillingCycle: `lastBilledAt >= periodStart` → `lastBilledAt > periodStart`
4. BillingService.runBillingCycle: `retryCount < maxRetries` → `retryCount <= maxRetries` 
5. BillingService.runBillingCycle: `subscription.status = 'suspended'` [removed - conditional never applied]
6. PaymentProvider.charge: `idempotencyKey` parameter [removed from request body]
7. calculateRetrySchedule: `retryIntervalHours * 60 * 60 * 1000` → `retryIntervalHours * 60 * 1000` (hours to minutes)
8. Event emission: `this.emit('subscription.charged', ...)` [removed]
9. Event emission: `this.emit('subscription.charge_failed', ...)` [removed]
10. Proration: `Math.round(amount * 100) / 100` → `Math.floor(amount * 100) / 100`
11. Upgrade validation: `if (newPlan.price <= currentPlan.price)` → `if (newPlan.price < currentPlan.price)` (missing equal case)
12. Billing cycle: `const periodStart = new Date(billingAnchorDay...)` calculation [date off by one in month]

For each surviving mutation:
1. Categorize it: (a) real bug risk, (b) equivalent mutation (semantically the same), or (c) low-risk implementation detail
2. For real bug risks: write the specific test that would kill it
3. Estimate what mutation score improvement we would see from killing all real bug risks

Expected output: Categorization of 12 mutations (likely 8-9 are real risks, 2-3 are equivalent), targeted tests for each real risk, and estimated score improvement to ~88-92%.

Step 8: Final Audit — Does the Suite Meet the Strategy?

Here is the test plan from Step 1 (risk analysis) and a summary of tests written in Steps 2-6:

[paste risk items from Step 1]

Test coverage summary:
- Unit tests: 22 tests across pricing, proration, and retry schedule
- Integration tests: 5 tests for billing cycle (success, retry, suspension, idempotency, concurrency)
- Contract tests: 3 Pact interactions for payment provider
- Property-based tests: 6 properties for proration calculation
- Mutation score: 88% after improvements from Step 7

Review this suite against the test plan:
1. For each high-risk item from the plan: is it covered? At the right level?
2. Are there any high-risk items with no test coverage?
3. Are there any tests that belong to a different layer than recommended?
4. What is the one remaining gap that represents the highest residual risk?

Expected output: A gap analysis showing coverage against the original risk plan, with one remaining gap identified (commonly: end-to-end billing cycle across a full month boundary, or edge case in concurrent charge prevention).

Bringing It Together: What a Complete Test Suite Looks Like

A complete test suite for the subscription billing service, at the end of this capstone, includes:

A documented test strategy with risk priorities
20-25 unit tests covering all financial calculations with specific value assertions
5 integration tests covering the full billing cycle stack with real database and real HTTP mocking
3 contract tests for the payment provider API boundary
6 property-based tests for proration invariants
A mutation score at or above 85%
A final gap analysis identifying any remaining residual risk

This is not 100% coverage of every possible behavior. It is strategic, risk-driven coverage that protects the most important behaviors with the right tools. That is the goal of an AI-assisted test strategy: not maximum coverage, but maximum protection per unit of effort.

Learning tip: Run this entire workflow — strategy, unit, integration, contract, property, mutation — on a module you own at work. The first time takes a full day. The second time takes three hours. The discipline becomes muscle memory.

Key Takeaways

A complete test suite is defined by risk coverage, not line coverage — know which behaviors must not fail and verify them first.
Unit tests protect financial calculations at the function level; integration tests protect the billing cycle at the system level; contract tests protect the external API boundary; property-based tests protect mathematical invariants.
TDD-driven implementation of the idempotency check is safer than adding it after the fact because the test defines the exact behavior contract before the implementation is written.
Mutation testing at the end of the suite is the final quality gate — it distinguishes a test suite that exercises code from one that actually protects it.
The workflow in this capstone (strategy → unit → integration → contract → property → mutation → audit) applies to any high-risk service and is repeatable with AI assistance in a fraction of the time it would take manually.