Structuring Codebase Context for Runnable Tests

What Codebase Context Should You Provide for AI to Generate Project-Aware Tests?

The difference between an AI-generated test that runs on the first try and one that requires 30 minutes of fixes is almost entirely the quality of codebase context you provide. Without context, AI generates generic tests in its own preferred style, using its own patterns, with invented helper functions and imports. With the right context, AI generates tests that slot directly into your existing test suite.

The Context Hierarchy

Think of codebase context in three tiers, from most to least important:

Tier 1 — Framework and Configuration (always include)
- Test framework and version (e.g., "Playwright 1.44, TypeScript 5.4")
- Test runner config file (playwright.config.ts, pytest.ini, jest.config.ts)
- Base URL and environment setup
- Authentication mechanism (how tests authenticate)

Tier 2 — Existing Patterns (include whenever generating tests for existing modules)
- One or two representative existing test files from the relevant module
- Your Page Object Model (POM) classes or test helper modules
- Fixture definitions and what they expose
- Import conventions and file naming standards

Tier 3 — Domain-Specific Details (include for specific flows)
- Relevant API response schemas
- Database model definitions (for understanding entity relationships)
- Application-specific constants (error codes, enum values, routing rules)

Building a Context Snapshot

For a Playwright TypeScript project, a complete context snapshot looks like this:

CODEBASE CONTEXT
================

PROJECT STRUCTURE:
tests/
  e2e/
    auth/
      login.spec.ts
      signup.spec.ts  
    orders/
      create-order.spec.ts
  api/
    products.spec.ts
  fixtures/
    auth.fixture.ts
    db.fixture.ts
  pages/           <-- Page Object Models
    LoginPage.ts
    DashboardPage.ts
    CheckoutPage.ts
  helpers/
    api-client.ts
    db-helpers.ts

PLAYWRIGHT CONFIG (playwright.config.ts):
export default defineConfig({
  testDir: './tests',
  use: {
    baseURL: process.env.BASE_URL || 'http://localhost:3000',
    trace: 'on-first-retry',
  },
  projects: [
    { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
  ],
});

AUTHENTICATION FIXTURE (tests/fixtures/auth.fixture.ts):
export const test = base.extend<{
  authenticatedPage: Page;
  adminPage: Page;
}>({
  authenticatedPage: async ({ browser }, use) => {
    const context = await browser.newContext({ storageState: 'tests/.auth/user.json' });
    const page = await context.newPage();
    await use(page);
    await context.close();
  },
  adminPage: async ({ browser }, use) => {
    const context = await browser.newContext({ storageState: 'tests/.auth/admin.json' });
    const page = await context.newPage();
    await use(page);
    await context.close();
  },
});
export { expect } from '@playwright/test';

IMPORT CONVENTION: 
import { test, expect } from '../../fixtures/auth.fixture';
// NOT from '@playwright/test' unless it's a non-authenticated test

ENV VARS IN USE:
process.env.BASE_URL, process.env.TEST_USER_EMAIL, process.env.TEST_USER_PASSWORD,
process.env.TEST_ADMIN_EMAIL, process.env.TEST_ADMIN_PASSWORD, process.env.TEST_DB_URL

When you paste this into your prompt, AI will use test from your fixture (not from @playwright/test), will reference existing POM classes, and will follow your directory structure.

The CLAUDE.md Pattern for Persistent Context

If you use Claude Code (or a similar terminal agent), maintain a CLAUDE.md file in your test directory root:


## Framework
- Playwright 1.44, TypeScript 5.4
- Node 20 LTS
- Test runner: @playwright/test

## Key Rules
- ALWAYS import `test` and `expect` from `tests/fixtures/auth.fixture` 
  (not from @playwright/test) unless the test doesn't need authentication
- ALWAYS use `data-testid` selectors as first choice
- ALWAYS use `getByRole` with accessible name as fallback
- NEVER use CSS class selectors or XPath
- NEVER hardcode base URL — use relative paths (playwright.config.ts sets baseURL)
- NEVER hardcode credentials — use process.env.TEST_USER_EMAIL etc.

## Existing POMs
- LoginPage: `tests/pages/LoginPage.ts` 
- DashboardPage: `tests/pages/DashboardPage.ts`
- CheckoutPage: `tests/pages/CheckoutPage.ts`

## Fixture Usage
- `authenticatedPage`: already logged in as regular user
- `adminPage`: already logged in as admin
- `{ page }`: unauthenticated page (use for login, signup, public pages only)

## Test File Naming
- E2E tests: `tests/e2e/{module}/{feature}.spec.ts`
- API tests: `tests/api/{resource}.spec.ts`

## Running Tests
- `npx playwright test` — all tests
- `npx playwright test tests/e2e/auth/` — specific directory
- `npx playwright test --ui` — interactive UI mode

With this file in place, Claude Code reads it at the start of every session and uses your conventions without being reminded in each prompt.

Learning Tip: Start your CLAUDE.md (or equivalent context file) with the things AI gets wrong most often in your project — not the things it usually gets right. If AI keeps using @playwright/test instead of your custom fixture, that goes at the top. The file should be a corrections register as much as a conventions guide. Review it monthly and add any new pattern violations you've had to correct more than once.

How to Make AI Aware of Your Page Object Models and Test Helper Utilities?

Page Object Models (POMs) are the highest-leverage context you can provide for Playwright test generation. When AI knows your POM interface, it will call your actual methods rather than re-implementing the interactions inline — producing tests that are consistent with your patterns and automatically inherit any self-healing logic in your POMs.

Providing POM Interfaces to AI

You don't need to paste the full POM implementation — just the public interface (class name, constructor, and public methods with their signatures):

PAGE OBJECT MODELS AVAILABLE:

// tests/pages/LoginPage.ts
class LoginPage {
  constructor(page: Page) {}

  async goto(): Promise<void>
  async fillEmail(email: string): Promise<void>
  async fillPassword(password: string): Promise<void>
  async submit(): Promise<void>
  async login(email: string, password: string): Promise<void>  // fills + submits
  async getErrorMessage(): Promise<string | null>
  async isLoggedIn(): Promise<boolean>
}

// tests/pages/CheckoutPage.ts
class CheckoutPage {
  constructor(page: Page) {}

  async goto(): Promise<void>
  async fillShippingAddress(address: ShippingAddress): Promise<void>
  async selectShippingMethod(method: 'standard' | 'express'): Promise<void>
  async fillPaymentCard(card: CardDetails): Promise<void>
  async placeOrder(): Promise<void>
  async getOrderNumber(): Promise<string>
  async getOrderTotal(): Promise<string>
  async isConfirmationVisible(): Promise<boolean>
}

// TypeScript interfaces:
interface ShippingAddress {
  firstName: string; lastName: string; address: string; city: string; 
  state: string; zip: string; email: string;
}
interface CardDetails {
  number: string; expiry: string; cvc: string;
}

Use these POM classes in the generated test. Do not re-implement their actions inline.

With this context, AI generates:

import { test, expect } from '../../fixtures/auth.fixture';
import { CheckoutPage } from '../../pages/CheckoutPage';

test('guest checkout completes successfully', async ({ page }) => {
  const checkout = new CheckoutPage(page);
  await checkout.goto();
  await checkout.fillShippingAddress({
    firstName: 'Jane', lastName: 'Doe',
    address: '123 Main St', city: 'San Francisco',
    state: 'CA', zip: '94102',
    email: '[email protected]'
  });
  await checkout.selectShippingMethod('standard');
  await checkout.fillPaymentCard({
    number: '4242 4242 4242 4242', expiry: '12/26', cvc: '123'
  });
  await checkout.placeOrder();
  expect(await checkout.isConfirmationVisible()).toBe(true);
  const orderNumber = await checkout.getOrderNumber();
  expect(orderNumber).toMatch(/^ORD-\d{8}$/);
});

Compare this to what AI generates without POM context: it would inline every page.fill() call, use fragile selectors, and produce a test that is inconsistent with your codebase.

Providing API Client and Database Helper Interfaces

For tests that use API or database helpers for setup/teardown:

HELPER UTILITIES AVAILABLE:

// tests/helpers/api-client.ts
class ApiClient {
  constructor(authToken?: string) {}

  async createUser(data: Partial<UserPayload>): Promise<User>
  async deleteUser(userId: string): Promise<void>
  async createProduct(data: Partial<ProductPayload>): Promise<Product>
  async createOrder(userId: string, items: OrderItem[]): Promise<Order>
  async updateOrderStatus(orderId: string, status: OrderStatus): Promise<Order>
}

// tests/helpers/db-helpers.ts  
export async function seedTestUser(overrides?: Partial<User>): Promise<User & { password: string }>
export async function cleanupUser(email: string): Promise<void>
export async function seedProducts(count: number): Promise<Product[]>
export async function cleanupOrdersByUser(userId: string): Promise<void>

USAGE PATTERN: 
- Use ApiClient for creating/cleaning complex entities
- Use db-helpers for simple seeding operations
- ALWAYS clean up in afterEach or afterAll, never in beforeAll only

Learning Tip: Maintain a "POM interface cheat sheet" as a text file in your test repo root (e.g., tests/INTERFACES.md). Update it whenever you add or change a POM method. This file becomes a one-paste context drop for any AI test generation session. It's also useful for onboarding new engineers — a signal that maintaining this file pays dividends beyond AI usage.

How to Configure AI Tools to Follow Your Project's Test Framework Conventions?

Conventions are the invisible structure of a healthy test suite — consistent file naming, standard describe block structure, shared assertion patterns. AI breaks conventions whenever it hasn't been explicitly told about them. The fix is providing your conventions as constraints in every generation prompt.

Defining Your Convention Constraints

Create a reusable "test conventions" snippet that you paste into any generation prompt:

TEST CONVENTIONS — FOLLOW THESE EXACTLY:

FILE STRUCTURE:
Every test file must follow this structure:
  import { test, expect } from '[correct fixture path]';
  import { [PageObjects] } from '[pages path]';

  test.describe('[Feature Area]: [Specific Feature]', () => {
    let [sharedVariables]; // declared here, initialized in beforeEach

    test.beforeEach(async ({ page/authenticatedPage }) => {
      // seed test data, navigate to start URL
    });

    test.afterEach(async () => {
      // cleanup seeded data
    });

    test('[verb] [subject] [expected outcome]', async ({ fixture }) => {
      // test body
    });
  });

TEST NAMING CONVENTION:
Format: "[verb] [subject] [expected outcome]"
Examples:
  'shows validation error when email is empty'
  'redirects to dashboard after successful login'
  'displays 404 page for non-existent product'
NOT:
  'test login' (too vague)
  'Login Test 1' (numbered, not descriptive)
  'it should redirect' (use plain test name, not 'it should')

TAGGING CONVENTION:
test('...', { tag: '@smoke' }, async () => {})    // for smoke tests
test('...', { tag: '@regression' }, async () => {})  // for regression tests
test('...', { tag: '@flaky' }, async () => {})    // for known flaky (mark for investigation)

ASSERTION STYLE:
- Use `await expect(locator).toBeVisible()` NOT `expect(await locator.isVisible()).toBe(true)`
- Use `toContainText()` for partial text checks, `toHaveText()` for exact text
- For URL assertions: `await expect(page).toHaveURL('/dashboard')`
- For API response assertions in E2E: use page.waitForResponse() pattern

WAIT STRATEGY:
- NEVER use `page.waitForTimeout()` or `await new Promise(r => setTimeout(r, ms))`
- Instead use: `await expect(locator).toBeVisible()`, `await page.waitForResponse()`,
  or `await page.waitForURL()`

Enforcing Conventions Through the Prompt System

For long-running projects using Claude Code or Gemini CLI, encode your conventions in the project-level context file:


## Conventions That Must Be Followed

### Never Do These (I will reject PRs that contain these patterns)
- page.waitForTimeout() / setTimeout / sleep
- nth() locators or :nth-child() CSS selectors  
- Hardcoded localhost:3000 or any specific port
- import from '@playwright/test' — always import from our fixture
- test.only() committed (only use during local debugging)

### Always Do These
- Run the test locally before submitting — broken tests don't go in PRs
- Add afterEach cleanup for any data created in beforeEach
- Use data-testid as primary selector strategy
- Add the @smoke tag to tests for critical paths

Convention Verification Prompt

After generating a test, ask AI to verify its own output:

Review the test you just generated and verify it follows ALL of these conventions:
1. Imports come from our fixture, not @playwright/test
2. File structure: describe > beforeEach > afterEach > test blocks
3. No waitForTimeout or hard sleeps
4. Test names follow format: "[verb] [subject] [expected outcome]"
5. All form inputs use data-testid or getByLabel selectors
6. Cleanup is present in afterEach for any data created in beforeEach

For each convention, report: PASS / FAIL / N/A.
For any FAIL, show the specific line and provide the corrected version.

Learning Tip: Treat your convention violations log as a feedback loop for improving your prompt templates. Every time AI generates a convention violation in a new session (not the same session, where you've already corrected it), add that violation to your conventions snippet with a "NOT: [what AI generated]" example alongside the "DO: [correct version]." AI learns from negative examples (what NOT to do) just as effectively as positive examples.

How to Prevent AI from Generating Generic Tests That Don't Fit Your Codebase?

Generic AI-generated tests are the failure mode that erodes trust in AI tooling. They look plausible, may even pass in isolation, but they don't align with how your application actually works — they use wrong selectors, call non-existent methods, reference incorrect API paths, or assert on data shapes that don't match your backend. The root cause is always insufficient context.

The Five Most Common Generic Test Patterns and How to Prevent Them

1. Generic imports and wrong fixture usage

Symptom: Test imports from @playwright/test instead of your custom fixture. No authentication state.

Prevention: Include your fixture export in the context. Show the exact import line: // CORRECT: import { test, expect } from '../../fixtures/auth.fixture'

2. Invented page object methods

Symptom: await loginPage.clickLoginButton() — but your POM only has login().

Prevention: Paste the complete public interface of all POM classes you want AI to use. Explicitly state: "Only call methods that appear in the interfaces below. Do not invent new methods."

3. Wrong API paths or response shapes

Symptom: Test calls /api/users but your API is at /api/v2/users. Response assertions don't match actual schema.

Prevention: Paste the relevant OpenAPI spec section or actual example responses from your API. State: "All API paths must exactly match those in the spec below."

4. Hard-coded test data that collides

Symptom: Tests create users with email: "[email protected]" — which collides with other tests or with existing database records.

Prevention: Include your test data conventions: "Use unique email addresses by appending a timestamp or random suffix: test+${Date.now()}@example.com." Show your existing test data patterns.

5. Missing or incorrect wait strategies

Symptom: Tests click a button and immediately assert, without waiting for the resulting navigation or API call. Tests are intermittently flaky.

Prevention: Include your wait strategy conventions and provide examples: "After clicking the submit button, ALWAYS await page.waitForURL('/confirmation') before asserting confirmation page content."

The Runability Check Prompt

Before running any generated test, run this review:

Before I run this test, check it for the following runability issues:

1. IMPORTS: Does every import path exist? Are all imported symbols exported by those modules?
2. FIXTURE USAGE: Are the correct fixture parameters used in each test function signature?
3. POM METHODS: Does every POM method call exactly match the interface provided?
4. ENV VARS: Are all referenced environment variables ones I listed?
5. SELECTORS: Are all data-testid attributes, ARIA roles, and labels ones that appear 
   in the DOM description I provided?
6. TEST DATA: Is there any hardcoded data that could collide across parallel test runs?
7. CLEANUP: Is there cleanup code for every piece of data created?

For each issue: describe it, give the line number, and provide the fix.

Using a Real Failing Test as Negative Context

One of the most powerful techniques is pasting a broken or generic AI-generated test alongside a correct example from your codebase:

Here are two tests. The first is WRONG (it was generated without proper context). 
The second is CORRECT (it follows our patterns).

WRONG (don't do this):
[paste the bad generated test]

CORRECT (follow this pattern exactly):
[paste a real test from your codebase]

Now generate a NEW test for [flow description] that follows the CORRECT pattern, 
NOT the WRONG one. If there's any conflict between the two, the CORRECT example wins.

This technique is highly effective because AI responds better to concrete examples of correct and incorrect patterns than to abstract convention descriptions.

Learning Tip: Keep a tests/examples/ directory with two or three "gold standard" test files that demonstrate your ideal test structure, selector strategy, fixture usage, and cleanup pattern. These files serve as the canonical positive examples to paste into any AI generation prompt. When you refactor a test to be particularly clean and idiomatic, add it to the examples directory. Over time this library becomes your most valuable prompt engineering asset.