·

AI Driven Test Strategy

AI Driven Test Strategy

A test strategy built with AI is only as good as the engineer steering it — AI can generate breadth fast, but without intentional risk prioritization, you end up with a lot of tests that test the wrong things.

Why Test Strategy Comes First

When engineers add AI to their workflow, the most common failure mode is not writing too few tests — it is writing too many of the wrong ones. AI models are exceptionally good at generating plausible-looking unit tests quickly. Given a function, an AI will produce a suite of happy-path tests, a few edge cases, and some error cases within seconds. This speed creates an illusion of coverage. Engineers ship with confidence, and bugs still reach production — not because the tests were wrong, but because they tested the wrong layer, the wrong assumption, or the wrong risk.

A test strategy answers a more fundamental question before any test is written: what are we actually trying to protect? In AI-assisted development, this question becomes even more important because the AI will happily fill any space you give it with plausible but potentially hollow tests. The strategy constrains where that energy goes.

Strategy means deciding: which behaviors must never break, which bugs would be most expensive to fix in production, and which parts of the system are changing fastest and therefore need the most test support. These are engineering judgments, not tasks you can delegate to an AI entirely — but AI can assist you in making them more systematically.

Learning tip: Write the test strategy document before asking AI to generate any tests. It takes 20 minutes and saves hours of deleting low-value tests later.

Risk-Based Prioritization: Where Bugs Are Most Costly

Risk-based testing is the practice of allocating testing effort proportional to the cost of a defect, not the ease of writing tests. A function that calculates sales tax deserves far more rigorous testing than a function that formats a date string, even though both are easy to test.

To do risk-based prioritization with AI, you need to give it enough business context to reason about consequences. An AI that only sees the code will prioritize based on structural complexity (cyclomatic complexity, number of branches). An AI that sees the code plus the business context will prioritize based on what actually hurts users and the business.

There are three factors that drive risk in a software system: probability of failure (how likely is a bug here?), impact of failure (what happens when it breaks?), and detectability (how quickly would we notice?). For each area of your system, score these roughly. High probability combined with high impact and low detectability is where you invest the most testing effort.

Common high-risk areas in most systems include: payment and billing logic, authentication and authorization, data migration scripts, external API integrations, and any calculation that produces a number a user will rely on. Common low-risk areas include purely cosmetic UI logic, simple CRUD that maps directly to a database schema, and logging or telemetry code.

Learning tip: Give AI your user story or feature spec and ask it to identify the three riskiest behaviors before generating any test code. The quality of the risk analysis often reveals gaps in the spec itself.

The Test Pyramid in an AI-Assisted Context

The test pyramid is a heuristic from Mike Cohn suggesting that teams should have many unit tests, fewer integration tests, and even fewer end-to-end tests. The logic is speed and isolation: unit tests are fast and precise, end-to-end tests are slow and flaky. This shape made sense when writing tests was expensive — you wrote fewer of the slow, expensive ones.

AI changes the cost equation. Writing unit tests is now nearly free. The risk is that teams end up with an inverted or bottom-heavy pyramid: thousands of unit tests for individual functions, and almost no integration or end-to-end coverage. This is actually worse than having fewer tests, because it creates false confidence. The units all work in isolation, but the system fails when they interact.

In an AI-assisted context, the pyramid shifts in two ways. First, be aggressive about deciding when a unit test is not the right tool. If a function's only risk is in how it interacts with a database, a unit test with a mocked database tells you almost nothing. Second, treat integration tests as a first-class investment. AI can write integration tests too, and they are more valuable for many real-world risks than an equivalent number of unit tests.

A useful reframe is the "test trophy" shape advocated by Kent C. Dodds: a small number of static analysis checks, a moderate number of unit tests, a large number of integration tests, and a small number of end-to-end tests. The largest layer is integration. This is a reasonable target for AI-assisted development teams.

Learning tip: Count your tests by layer monthly. If unit tests outnumber integration tests by more than 5:1, you probably have a hollow pyramid. Ask AI to help you identify which unit tests are actually testing integration concerns and should be promoted.

Using AI to Find Coverage Blind Spots

AI is excellent at finding what you have not tested, especially when you provide it with both the code and the existing tests. This is a use case that pure coverage metrics miss: a line can be covered by a test that makes no meaningful assertion about its behavior.

The technique is to ask AI to review your test suite and identify behaviors that are exercised but not verified. This goes beyond line coverage and branch coverage into behavioral coverage — the question of whether your tests would actually catch a meaningful defect.

A second technique is to provide AI with your feature specification or requirements document alongside your tests and ask it to identify requirements that have no corresponding test. This cross-referencing between requirements and tests is tedious for humans and fast for AI.

A third use case is asking AI to imagine adversarial inputs — what would a malicious or careless user do that your tests do not cover? This is particularly valuable for input validation logic and security-sensitive code paths.

Learning tip: Once a month, paste your test file and the corresponding source file into a conversation and ask the AI: "What behavior in this code is not verified by any assertion in these tests?" You will almost always find something.

Writing a Test Plan Document with AI

A test plan document is a lightweight artifact that captures what you are testing, why, and at what level before implementation begins. It serves as a communication tool (useful for code review and team alignment) and a constraint on AI test generation.

The most effective format includes: a one-paragraph description of the feature, a list of key behaviors that must be tested, a layer assignment for each behavior (unit, integration, or end-to-end), risk ratings for each behavior, and any testing constraints (e.g., "cannot spin up a real payment processor in CI").

AI can draft this document from a feature spec in under a minute, but the value comes from engineers reviewing and editing it — not from accepting the AI draft wholesale. The AI will often miss domain-specific risks and over-emphasize structural complexity.

Learning tip: Treat the test plan as the first deliverable of any feature, not the last. Write it before implementation. Review it in the same PR where the feature lands.

Pragmatic Coverage Goals for AI-Generated Code

Coverage targets are controversial, and for good reason: a poorly written test suite can achieve 100% line coverage while providing almost no protection against real defects. That said, coverage targets have value as a floor — a minimum bar that prevents drift into untested territory.

For AI-generated code, the risk is different than for human-written code. AI-generated code tends to look more correct than it is — it is syntactically sound, follows common patterns, and passes linter checks. The subtle logic errors are harder to spot visually. This means the cost of missing test coverage is higher for AI-generated code than for carefully hand-written code.

A pragmatic approach: set a line coverage target of 80% as a floor, not a goal. The goal is meaningful behavioral coverage. Use mutation testing (covered in a later topic) to check whether your tests actually catch defects rather than just touching lines. For high-risk modules (payment, auth, critical calculations), aim for explicit behavioral coverage of every documented requirement, not just line coverage.

Review AI-generated tests for false confidence before merging. The most common failure mode is a test that makes an assertion that can never fail — for example, asserting that a return value is not null when null is structurally impossible, or asserting that an array has length greater than zero when the function always returns at least one element.

Learning tip: When reviewing AI-generated tests, ask yourself: "What would have to change in the implementation for this assertion to fail?" If the answer is "almost nothing realistic," the test is providing false confidence.


Hands-On: Build a Test Strategy Document from a Feature Spec

This exercise walks through using AI to generate and refine a test strategy for a concrete feature. Use a feature from your current project, or use the example spec below.

Example spec: A user can apply a promotional coupon code at checkout. If the code is valid and not expired, the discount is applied to the order total. If the code is invalid or expired, an error is shown. Codes can be percentage-based or fixed-amount. Codes can have a maximum number of uses.

Step 1: Feed the spec to AI and ask for risk identification.

I am writing a test strategy for this feature. Before generating any tests, identify the top 5 riskiest behaviors — places where a bug would be most costly to users or the business.

Feature spec:
[paste spec here]

For each risk, explain: what could go wrong, what the impact would be, and whether it would be immediately obvious to users.

Expected output: A ranked list of risky behaviors — likely including: discount calculation errors (user over- or under-charged), expired code still accepted, code used more than its limit, race condition on max-uses enforcement, negative total possible with large fixed discount.

Step 2: Convert the risk list into a test plan.

Based on these risks, write a test plan document with the following structure:
- Feature name
- Key behaviors to test (from risk analysis)
- For each behavior: test layer (unit/integration/e2e), risk level (high/medium/low), and a one-line description of what a good test would verify

Focus on behaviors that could cause financial harm or user-facing errors first.

Expected output: A structured table or list mapping behaviors to test layers with risk annotations.

Step 3: Identify blind spots by cross-referencing spec and plan.

Review this test plan against the original spec. Identify any requirements mentioned in the spec that do not appear in the test plan. Also identify any edge cases in the business rules (like coupon stacking, zero-value orders, or orders with no eligible items) that the spec implies but does not state explicitly.

Expected output: A list of gaps — typically 3-6 implicit requirements the first pass missed.

Step 4: Assign coverage goals by module.

Given this test plan, what line coverage target would you recommend for each module involved in coupon processing? For modules where line coverage is a poor proxy for quality, suggest what behavioral coverage criteria to use instead.

Expected output: Module-by-module coverage recommendations with reasoning.

Step 5: Generate the test file stubs.

Generate the test file structure (describe/it blocks, no implementation yet) for the unit tests identified in this plan. Use [your test framework, e.g. Jest/Vitest/pytest]. Include a comment above each test explaining what defect it is designed to catch.

Expected output: A test file with describe/it structure, each test annotated with its purpose.

Step 6: Review the generated stubs against your risk list.

Go through the generated stubs manually and verify: does each high-risk behavior from Step 1 have at least one corresponding test stub? Are the test descriptions precise enough that another engineer would know what a failure means? Mark any gaps and iterate.


Key Takeaways

  • AI generates tests faster than it generates strategy — always define what you are testing and why before asking AI to generate test code.
  • Risk-based prioritization requires business context that the code alone does not contain; provide that context explicitly in prompts.
  • The test pyramid shifts in AI-assisted development toward integration tests, because unit tests are cheap but often test the wrong level of isolation.
  • AI is excellent at finding coverage blind spots when given both the code and existing tests; use it for regular test audits.
  • Coverage targets are floors, not goals — for AI-generated code, use mutation testing to verify that your coverage actually detects defects.