Generating tests with AI is fast — generating tests that actually protect your system requires the same critical thinking you bring to reviewing any AI-generated code.
The Spectrum from Hollow to Genuine Tests
There is a meaningful difference between a test that passes and a test that protects. When you ask an AI to "write tests for this function," it will produce tests that are syntactically correct, structurally reasonable, and often hollow at the assertion level. They pass when the code is correct and may continue to pass when the code is wrong, because the assertions are too weak to distinguish right from wrong.
The three most common forms of hollow AI-generated tests are: assertions that the return value exists but not what it equals, assertions on side effects that are mocked so completely that the real behavior is never exercised, and tests that verify implementation details rather than behavior (testing that a specific internal method was called rather than that the correct outcome occurred).
Understanding these failure modes before you generate tests lets you write better prompts and catch problems in review. The goal is not to stop using AI for test generation — it is to use it for the right things and stay alert to what it gets wrong.
Learning tip: Before asking AI to generate tests, write one test yourself. This forces you to think about what a meaningful assertion actually looks like, and gives AI a concrete example to match in style and depth.
Prompting AI for High-Quality Unit Tests
A unit test's job is to verify a single behavior of a single unit of code in isolation. The word "isolation" is doing important work: if your unit test requires setting up a database, starting a server, or calling an external API, it is not a unit test, and AI will often produce exactly this kind of test if you do not constrain it.
The prompts that produce the best unit tests share three characteristics: they provide context about what the function is supposed to do (not just the code), they explicitly ask for edge cases and failure scenarios (not just happy paths), and they specify what meaningful assertions look like for this domain.
For a pricing function, a meaningful assertion is not "it returns a number" — it is "given a 10% discount applied to a $100 order, the total is exactly $90.00." The specificity of the expected value is what makes the test valuable. Vague assertions reveal nothing when they fail.
Unit tests should also cover the boundary between valid and invalid inputs. AI will generate these if you ask, but often clusters them around the obvious cases and misses the subtle ones. For a function that accepts a percentage discount between 0 and 100, the interesting boundary cases are: 0, 100, 0.001, 99.999, -0.001, and 100.001. Ask for these explicitly.
Learning tip: Add the phrase "include tests for boundary values and invalid inputs that should throw or return specific error states" to every unit test prompt. AI will not generate these by default.
Generating Integration Tests That Test Real Behavior
Integration tests verify that multiple components work together correctly. The most common mistake in AI-generated integration tests is over-mocking: the AI mocks the database, mocks the external service, mocks the internal module, and ends up testing only that the code calls the right mock methods in the right order. This kind of test is worse than useless — it gives you coverage numbers without protection.
The rule of thumb for integration tests: mock at the boundary of your system, not inside it. If you are testing a service layer that uses a repository that uses a database, the right place to mock is the external HTTP call that leaves your system — not the repository, not the database connection. Use a real in-memory database (SQLite, H2, testcontainers) or a test database fixture instead.
When prompting AI for integration tests, specify explicitly what should not be mocked. Tell the AI which layers are in scope for the test and which external dependencies it is acceptable to stub. This prevents the AI from defaulting to full isolation.
Integration tests are also the right place to verify behavior across transactions, error propagation between layers, and the effects of side effects like cache invalidation or event emission. These are precisely the things unit tests miss and exactly what integration tests should cover.
Learning tip: When reviewing AI-generated integration tests, count the mocks. If there are more than one or two, question whether each mock is testing at the right level. A test with five mocks is almost certainly testing the wrong thing.
Contract Testing with Pact and API Schema
Contract testing is a specialized form of integration testing for service boundaries — specifically, for the interfaces between services that you own and services you depend on (or that depend on you). A contract test verifies that your service honors the contract that its consumers expect, and that the APIs you consume behave the way you expect them to.
Pact is the most widely used contract testing tool. It works by having consumers define expectations in a "pact file" (a JSON artifact describing what requests they make and what responses they expect), and then verifying that the provider can satisfy those expectations. This decouples service testing — consumers and providers can test their contracts independently, without running both services simultaneously.
AI is genuinely useful for generating Pact contract tests because the pattern is mechanical and the prompts are straightforward. You provide the API schema (OpenAPI/Swagger) or an example request/response pair, and AI generates both the consumer-side pact definition and the provider verification scaffold.
For REST APIs without a formal contract testing tool, AI can generate tests against an OpenAPI schema using tools like dredd or schemathesis. These tools validate that actual API responses conform to the schema — a form of contract testing that requires no changes to consumers.
Learning tip: Start contract testing at service boundaries where the most bugs have been caused by mismatched assumptions between teams. Contract tests are most valuable where two teams independently evolve two ends of an interface.
Writing Prompts That Produce Meaningful Assertions
The most impactful single change you can make to AI-generated test quality is specifying what meaningful assertions look like. A test that asserts "the function completes without throwing" is almost never useful. A test that asserts "the function returns the expected value under these specific conditions" is useful. A test that asserts "the function transitions the system to the expected state" is useful.
There are three types of meaningful assertions for most code: return value assertions (what did the function return?), state assertions (what is the state of the system after the call?), and interaction assertions (did the function interact with its dependencies in the expected way?). The last type — interaction assertions — should be used sparingly and only when the interaction itself is the behavior being tested, not as a proxy for return value or state assertions.
When prompting AI, you can provide example assertions as anchors. Show the AI one or two test cases with the level of assertion detail you want, then ask it to generate more tests at the same level of specificity.
Learning tip: Use the phrase "each test should have an assertion that would fail if [specific wrong behavior] occurred" in your prompts. This forces AI to reason about what makes an assertion meaningful.
Test Data Generation
Generating realistic test data is tedious and AI excels at it. For unit tests, test data is usually a few carefully chosen values. For integration tests, test data often needs to be a realistic set of database fixtures: users with specific plan types, orders in various states, products with edge-case pricing.
AI can generate fixture files, factory functions, and builder patterns for test data. The prompts that work best provide: the schema or type definition, the scenarios that need to be exercised, and any constraints on the data (e.g., "user IDs must be UUIDs, not integers").
For property-based testing (covered in a later topic), AI can also suggest the generators that should be used for a given domain — what range of values is representative and what values are adversarial.
Learning tip: Generate test data factories once and reuse them. Ask AI to generate a factory function or builder for your core domain objects, then use that factory across all your tests rather than generating ad hoc data in each test file.
Hands-On: Generate a Full Test Suite for a Coupon Service
This exercise walks through generating unit, integration, and contract tests for a simple coupon validation service. The service has a single endpoint: POST /coupons/validate that accepts a coupon code and an order total, and returns the discounted total or an error.
Step 1: Generate unit tests for the discount calculation logic.
Assume a function applyDiscount(orderTotal, coupon) that takes a numeric order total and a coupon object with fields: type ("percentage" or "fixed"), value (number), minOrderValue (number), maxUses (number), usesRemaining (number), expiresAt (ISO date string).
Write comprehensive unit tests for this function in [Jest/Vitest/pytest]:
function applyDiscount(orderTotal, coupon) {
// Returns { discountedTotal, discountAmount } or throws an error
}
The function should handle:
- Percentage discounts (e.g., 20% off)
- Fixed-amount discounts (e.g., $15 off)
- Minimum order value requirement
- Expired coupons (compare against current date)
- Coupons with no uses remaining
- Coupons where discount would make total negative (floor at 0)
Include tests for:
- Every happy path case
- Every validation failure case with the specific error message expected
- Boundary values (e.g., exactly at minimum order value, exactly one use remaining)
- An assertion comment above each test explaining what defect it catches
Expected output: 12-18 test cases organized in describe blocks, each with specific value assertions and inline documentation.
Step 2: Generate integration tests for the validation endpoint.
Write integration tests for this endpoint: POST /coupons/validate
Request body: { couponCode: string, orderTotal: number }
Response (success): { discountedTotal: number, discountAmount: number, coupon: { code, type, value } }
Response (error): { error: string, code: "INVALID_COUPON" | "EXPIRED_COUPON" | "BELOW_MIN_ORDER" | "NO_USES_REMAINING" }
Use [supertest/httpx/your HTTP test client] against the real application server.
Use a real test database (not mocks) populated with fixture data.
Do NOT mock the database layer or the coupon repository.
Mock only: the current date (so you can test expiry without time-dependent tests), and the external payment provider if present.
Include tests for the full request-response cycle including HTTP status codes.
Expected output: Integration tests using a real server instance with database fixtures, testing the full stack from HTTP layer through to data layer.
Step 3: Generate a Pact consumer contract test.
Write a Pact consumer contract test for a frontend client that calls POST /coupons/validate.
The consumer (frontend) expects:
- On valid coupon: 200 with body matching { discountedTotal: number, discountAmount: number }
- On invalid coupon: 422 with body matching { error: string, code: string }
Generate:
1. The Pact consumer test that defines these interactions
2. The provider verification test that validates the real API satisfies the pact
3. A brief explanation of how to run provider verification in CI
Use the @pact-foundation/pact library (JavaScript) or pact-python.
Expected output: Consumer test file defining pact interactions, provider verification test, CI integration notes.
Step 4: Review generated tests for false confidence.
Take the unit tests from Step 1 and run this review prompt:
Review these unit tests for false confidence. Specifically:
1. Find any assertions that can never fail (assertions where the asserted condition is always true regardless of implementation)
2. Find any tests that would pass even if the function returned a hardcoded value instead of computing a real result
3. Find any missing cases where the spec says something specific but the tests only check a weaker condition (e.g., spec says "returns exactly $X" but test only checks "returns a positive number")
For each issue you find, suggest a stronger replacement assertion.
Expected output: A review with 2-5 identified issues and corrected assertions.
Step 5: Improve test descriptions.
The test descriptions in this file use vague language like "should work correctly" or "should handle errors". Rewrite each test description to follow the format: "given [context], when [action], then [expected outcome]". The description should be specific enough that a failing test tells an engineer exactly what broke.
Expected output: Test file with renamed tests that serve as precise failure messages.
Key Takeaways
- AI defaults to happy-path tests and weak assertions — always explicitly request edge cases, boundary values, and specific expected values in your prompts.
- Integration tests should mock only at system boundaries (external APIs, email providers), not inside your own code layers.
- Contract tests with Pact decouple service teams and catch integration failures before services are deployed together.
- Review AI-generated tests for false confidence: any assertion that cannot distinguish a correct implementation from a trivially wrong one is not protecting you.
- Test data factories generated by AI should be written once and reused; ad hoc fixture data in every test file becomes a maintenance burden fast.