E2E Test Script Generation with AI

How to Describe User Flows to AI for Accurate E2E Script Generation?

The quality of an AI-generated E2E test script is almost entirely determined by the quality of the flow description you provide. A vague prompt like "write a test for the checkout flow" will produce a generic, shallow test that probably won't run in your environment. A precise, structured description of the user flow will produce a test that is immediately runnable and covers the real business logic.

The User Flow Description Template

Use this structured format every time you're asking an AI to generate an E2E test:

FEATURE: [Feature name]
URL/SCREEN: [Starting URL or screen name]
PRE-CONDITIONS: [What must be true before the test starts — logged in? specific data seeded?]
ACTOR: [Who is performing this flow — guest, admin, logged-in user?]

STEPS:
1. [Action] → [Expected state/result]
2. [Action] → [Expected state/result]
...

POST-CONDITIONS: [What must be true at the end — database state, redirect, email sent?]

TEST DATA:
- [Field]: [Value or description of value type]
- [Field]: [Value or description of value type]

EDGE CASES TO COVER:
- [Edge case 1]
- [Edge case 2]

Example: Checkout Flow Description

Here is what a well-specified prompt looks like for a checkout flow:

FEATURE: Guest Checkout — Credit Card Payment
URL: https://shop.example.com/cart (cart contains 2 items totaling $89.50)
PRE-CONDITIONS: 
  - Cart seeded with product IDs [SKU-001, SKU-002]
  - User is NOT logged in (guest session)
  - Stripe test mode is active
ACTOR: Anonymous guest user

STEPS:
1. Navigate to /cart → Cart shows 2 items, subtotal $89.50
2. Click "Proceed to Checkout" → Redirected to /checkout/shipping
3. Fill shipping form: 
   - First Name: "Jane"
   - Last Name: "Doe"
   - Address: "123 Main St"
   - City: "San Francisco"
   - State: "CA"
   - ZIP: "94102"
   - Email: "[email protected]"
4. Click "Continue to Payment" → Redirected to /checkout/payment, order summary visible
5. Fill Stripe card iframe:
   - Card number: "4242 4242 4242 4242"
   - Expiry: "12/26"
   - CVC: "123"
6. Click "Place Order" → Redirected to /checkout/confirmation
7. Confirmation page shows order number, summary, and "Check your email" message

POST-CONDITIONS:
  - Order created in DB with status "pending_payment"
  - Confirmation email sent to [email protected]

TEST DATA:
  - Use Stripe test card 4242424242424242 (always succeeds)
  - Email must follow +test pattern for filtering in test environment
  - ZIP must be valid 5-digit US code

EDGE CASES TO COVER:
  - Declined card (use Stripe card 4000000000000002)
  - Invalid email format in shipping form
  - Empty cart redirect to /products

With this level of detail, AI will generate selectors targeting your specific DOM, use your test data patterns, and produce assertions tied to real expected outcomes.

What to Include in Your Flow Description

The following elements are critical for accurate generation:

Selectors and UI landmarks: If you know the IDs or ARIA labels of key elements, include them. data-testid="checkout-btn" in your description will be reflected in the generated test.

Network behavior: Does the flow involve API calls with loading states? Mention them: "After clicking Submit, a loading spinner shows for ~2s while the API processes."

Authentication state: Always be explicit. "User is logged in as role=admin with email [email protected] and password from env var TEST_ADMIN_PASSWORD" is vastly more useful than "user is logged in."

Dynamic content: If elements are rendered dynamically (virtual lists, infinite scroll, conditional sections), describe when and how they appear.

Learning Tip: Before writing your next prompt, spend 5 minutes filling out the template above for the flow you want to test. The discipline of writing out pre-conditions and post-conditions will often reveal test design gaps that AI couldn't have caught — because you hadn't defined them yet. Good prompt writing is good test design.

How to Generate Reliable Playwright Tests — Selectors, Assertions, and Fixtures — with AI?

Playwright tests fail for three main reasons: brittle selectors that break when the DOM changes, missing assertions that don't verify the actual business outcome, and fixture/state management issues that cause test pollution. AI can introduce all three problems if not prompted carefully.

Directing AI to Use Resilient Selectors

Tell AI explicitly which selector strategy to use. Left to its own devices, it may reach for nth-child CSS selectors or XPath — both are fragile. Your prompt should specify:

Generate a Playwright test for the user flow described below.

SELECTOR STRATEGY:
- Prefer data-testid attributes when available (format: data-testid="...")
- Fall back to ARIA roles with accessible names: getByRole('button', { name: 'Place Order' })
- Use getByLabel() for form inputs
- NEVER use CSS class selectors or XPath unless absolutely unavoidable
- If you have to use a fallback selector, add a comment explaining why

FRAMEWORK CONTEXT:
- Playwright version: 1.44
- Language: TypeScript
- Test runner: @playwright/test
- Base URL is set in playwright.config.ts — use relative paths

Generate a Playwright TypeScript test for the login flow using the context below.

FRAMEWORK CONTEXT:
- Use @playwright/test (test, expect, Page)
- Base URL is configured in playwright.config.ts — use relative paths
- Authentication fixture is defined in tests/fixtures/auth.fixture.ts as:
  export const authFixture = base.extend<{ loggedInPage: Page }>({ ... })
  It accepts { email, password, role } and navigates to /dashboard on success.
- All form fields use data-testid attributes. Examples from the DOM:
  <input data-testid="login-email" />
  <input data-testid="login-password" />
  <button data-testid="login-submit">Sign In</button>
  <div data-testid="login-error-message">Invalid credentials</div>

FLOW TO TEST:
1. Navigate to /login
2. Enter valid credentials (use TEST_USER_EMAIL and TEST_USER_PASSWORD from env)
3. Click sign in
4. Assert redirect to /dashboard
5. Assert user avatar visible with aria-label="User menu"

ALSO GENERATE:
- A negative test: invalid password → error message visible
- A negative test: empty email → inline validation error on email field

The expected output should look roughly like:

import { test, expect } from '@playwright/test';

test.describe('Login', () => {
  test('successful login redirects to dashboard', async ({ page }) => {
    await page.goto('/login');
    await page.getByTestId('login-email').fill(process.env.TEST_USER_EMAIL!);
    await page.getByTestId('login-password').fill(process.env.TEST_USER_PASSWORD!);
    await page.getByTestId('login-submit').click();
    await expect(page).toHaveURL('/dashboard');
    await expect(page.getByRole('button', { name: 'User menu' })).toBeVisible();
  });

  test('invalid password shows error message', async ({ page }) => {
    await page.goto('/login');
    await page.getByTestId('login-email').fill(process.env.TEST_USER_EMAIL!);
    await page.getByTestId('login-password').fill('wrong-password-123');
    await page.getByTestId('login-submit').click();
    await expect(page.getByTestId('login-error-message')).toBeVisible();
    await expect(page.getByTestId('login-error-message')).toContainText('Invalid credentials');
  });

  test('empty email shows inline validation', async ({ page }) => {
    await page.goto('/login');
    await page.getByTestId('login-submit').click();
    await expect(page.getByTestId('login-email')).toHaveAttribute('aria-invalid', 'true');
  });
});

Prompting for Fixtures and Setup/Teardown

When your test requires database state, API seeding, or other setup, tell AI where your fixture layer lives:

Our test fixtures live in tests/fixtures/. We have:
- db.fixture.ts — exposes a `db` object with methods like db.createUser(), db.deleteUser(), db.createOrder()
- api.fixture.ts — exposes an `apiClient` for making authenticated API calls in setup

For the test below, use these fixtures to seed an order in "pending" state before the test runs,
and clean it up in afterEach using db.deleteOrder(orderId).

Asserting Network Calls, Not Just UI State

AI defaults to asserting only visible UI state. Prompt it explicitly to also assert network responses when relevant:

After the checkout submit action, add a Playwright waitForResponse assertion that:
- Intercepts POST /api/v1/orders
- Asserts the response status is 201
- Asserts the response body contains { status: "pending_payment" }

This produces tests that catch backend regressions even when the UI shows a success message incorrectly.

Learning Tip: After AI generates a Playwright test, search the output for nth(, .locator('div., .locator('span., and xpath=. Any of these in the generated code is a red flag — instruct AI to replace them with getByRole, getByTestId, or getByLabel. This single habit will prevent the majority of selector-related flakiness in AI-generated tests.

How to Generate Maestro Mobile E2E Flows with AI Assistance?

Maestro is a mobile E2E testing framework from Mobile.dev that uses a YAML-based flow syntax. It runs on both iOS and Android without requiring code, making it accessible — but the YAML syntax is specific and AI needs explicit guidance to produce valid flows.

Providing Maestro Context to AI

AI tools are less likely to have seen extensive Maestro code than Playwright. Always provide the Maestro YAML syntax in your prompt:

Generate a Maestro YAML flow for the following mobile user journey.

MAESTRO SYNTAX REFERENCE (use exactly these patterns):
appId: com.example.myapp
---
- launchApp
- tapOn: "Sign In"
- tapOn:
    id: "login_email_field"
- inputText: "[email protected]"
- tapOn:
    id: "login_password_field"
- inputText: "${PASSWORD}"
- tapOn: "Login"
- assertVisible: "Welcome back"
- takeScreenshot: login_success

AVAILABLE SELECTORS in the app:
- id: "login_email_field"
- id: "login_password_field"  
- id: "login_cta_button"
- id: "home_screen_header"
- text: "Welcome back, ${username}"

ENVIRONMENT VARIABLES AVAILABLE:
- ${EMAIL}: test user email
- ${PASSWORD}: test user password

FLOW TO TEST:
1. Launch app, land on onboarding screen
2. Tap "Already have an account? Sign In"
3. Enter email and password
4. Tap login button
5. Assert home screen header visible
6. Assert welcome message contains the username

Generating Parameterized Maestro Flows

One of Maestro's most powerful features is parameterized flows via env files. Prompt AI to generate these together:

Generate a parameterized Maestro flow for adding an item to the cart, plus the corresponding 
.env file for test data. The flow should accept:
- PRODUCT_NAME (string)
- PRODUCT_QUANTITY (integer)
- EXPECTED_CART_TOTAL (string, e.g., "$45.00")

Use Maestro's ${VARIABLE} syntax for parameterization. Also generate a .env file with one
realistic test data set.

Handling Maestro Assertions

AI sometimes generates Maestro flows with assertions that don't match the actual Maestro API. Provide correction guidance:

IMPORTANT MAESTRO ASSERTION RULES:
- Use assertVisible for checking element presence, NOT assertTrue
- Use assertNotVisible for checking element absence
- Use assertContainsText for partial text matching
- For waiting, use runFlow with a timeout or add `- waitForAnimationToEnd` before assertions
- Do NOT use assertText (it does exact match only and is rarely what we want)

Learning Tip: Maestro's YAML is unforgiving about indentation and key names. After AI generates a flow, run maestro test <flow.yaml> --debug to get the full execution trace. The debug output shows exactly which step failed and why — paste that output back into your AI conversation with "Fix this Maestro flow based on this debug output" for a rapid iteration loop.

How to Generate Robot Framework Keyword-Driven Tests with AI?

Robot Framework's keyword-driven approach is well-suited to AI generation — the human-readable keyword syntax maps naturally to the user-story language QA engineers use. However, AI needs to understand your keyword library structure to generate tests that use your existing keywords rather than inventing non-existent ones.

Providing Your Keyword Library to AI

The most important context for Robot Framework generation is your existing keyword library:

Generate a Robot Framework test suite for the user registration flow.

KEYWORD LIBRARIES AVAILABLE:
1. Browser library (robotframework-browser): Use Browser library keywords for web interaction
   - Open Browser  ${URL}  chromium
   - Click  ${locator}
   - Fill Text  ${locator}  ${text}
   - Get Text  ${locator}
   - Should Be Visible  ${locator}
   - Wait For Elements State  ${locator}  visible  timeout=10s

2. Custom keyword file: resources/keywords/user_keywords.robot
   Contains these keywords (copy their signatures exactly):
   - Register New User  [Arguments]  ${email}  ${password}  ${role}=user
   - Login As User  [Arguments]  ${email}  ${password}
   - Verify Dashboard Loaded
   - Teardown: Delete User  [Arguments]  ${email}

3. Custom keyword file: resources/keywords/api_keywords.robot  
   Contains:
   - Create User Via API  [Arguments]  ${email}  ${password}  ${role}=user  Returns: ${user_id}
   - Delete User Via API  [Arguments]  ${user_id}

VARIABLE FILES:
- resources/variables/common_variables.robot: ${BASE_URL}, ${ADMIN_EMAIL}, ${ADMIN_PASSWORD}
- resources/variables/test_data.robot: ${VALID_USER_EMAIL}, ${VALID_USER_PASSWORD}

TEST FILE LOCATION: tests/web/authentication/
SUITE SETUP: Use API keywords for fast seeding; Browser keywords for interaction steps

Example Robot Framework Generation Prompt and Output

Prompt:

Generate a Robot Framework test suite for the password reset flow using the keyword libraries 
above. Cover:
1. Happy path: user requests reset, receives link, resets password, logs in with new password
2. Invalid email: non-existent email shows error
3. Expired link: navigating to an expired reset link shows expiry message

Use data-driven format (Test Template) for the negative cases.
Follow this file structure:
*** Settings ***
*** Variables ***
*** Test Cases ***
*** Keywords ***

Expected output shape:

*** Settings ***
Library    Browser
Resource    ../../../resources/keywords/user_keywords.robot
Resource    ../../../resources/keywords/api_keywords.robot
Variables    ../../../resources/variables/common_variables.robot
Variables    ../../../resources/variables/test_data.robot

Suite Setup    Create User Via API    ${VALID_USER_EMAIL}    ${VALID_USER_PASSWORD}
Suite Teardown    Delete User Via API    ${CREATED_USER_ID}

*** Variables ***
${RESET_EMAIL_SUBJECT}    Reset your password

*** Test Cases ***
Happy Path: Password Reset Completes Successfully
    [Tags]    smoke    authentication
    Request Password Reset    ${VALID_USER_EMAIL}
    ${reset_link}=    Get Password Reset Link From Mailbox    ${VALID_USER_EMAIL}
    Navigate To Reset Link    ${reset_link}
    Set New Password    NewPassword123!    NewPassword123!
    Should Be Visible    [data-testid="password-reset-success"]
    Login As User    ${VALID_USER_EMAIL}    NewPassword123!
    Verify Dashboard Loaded

Invalid Email Shows Error
    [Template]    Verify Invalid Reset Email Behavior
    [email protected]    User not found
    not-an-email               Invalid email format
    ${EMPTY}                   Email is required

*** Keywords ***
Request Password Reset
    [Arguments]    ${email}
    Go To    ${BASE_URL}/forgot-password
    Fill Text    [data-testid="reset-email-input"]    ${email}
    Click    [data-testid="reset-submit-btn"]

Verify Invalid Reset Email Behavior
    [Arguments]    ${email}    ${expected_error}
    Request Password Reset    ${email}
    Should Be Visible    [data-testid="reset-error-message"]
    Get Text    [data-testid="reset-error-message"]    ==    ${expected_error}

Preventing AI from Inventing Keywords

AI will sometimes invent plausible-sounding keywords that don't exist. Add this to your prompt:

IMPORTANT: Only use keywords from the libraries listed above. If a needed action does not have 
an existing keyword, implement it as a new keyword in the *** Keywords *** section of this file.
Do NOT call keywords that aren't defined in either the provided libraries or this file's own
*** Keywords *** section.

Learning Tip: Before generating any Robot Framework tests with AI, run robot --dryrun tests/ against your existing suite and paste the keyword inventory output into your prompt. This gives AI an accurate picture of every available keyword and prevents invented keyword calls. The dry-run output is compact — typically a few hundred lines — well within any AI context window.

How to Review AI-Generated E2E Scripts for Correctness and Flakiness Risk?

AI-generated E2E tests require systematic review before being committed. The review is not just about "does this test make sense" — it's a structured quality gate targeting the specific failure modes AI introduces most often.

The AI Test Review Checklist

Work through this checklist on every AI-generated E2E test file before merging:

Selector quality:
- [ ] Are all locators using resilient strategies (data-testid, ARIA roles, labels)?
- [ ] Are there any nth-child, CSS class, or XPath selectors? Flag and replace.
- [ ] Do the locators actually exist in the current DOM? Verify against the real application.

Assertion completeness:
- [ ] Does every action have a meaningful assertion? (Not just "click and move on")
- [ ] Do assertions verify business outcomes, not just UI element visibility?
- [ ] Are there assertions after async operations (API calls, redirects)?

Flakiness patterns:
- [ ] Are there any hard-coded waitForTimeout / sleep calls? Replace with waitForSelector or waitForResponse.
- [ ] Are there race conditions — e.g., clicking a button before it's enabled?
- [ ] Does the test assume a specific order of items in a list that might vary?

State management:
- [ ] Does the test seed its own data, or rely on pre-existing database state?
- [ ] Is there proper cleanup (afterEach/afterAll) to prevent test pollution?
- [ ] Does the test handle the case where cleanup from a previous run didn't complete?

Environment assumptions:
- [ ] Are credentials hardcoded? Replace with environment variable references.
- [ ] Are URLs hardcoded? Replace with base URL configuration.
- [ ] Does the test assume a specific time zone or locale?

Using AI to Review AI-Generated Tests

This is one of the most powerful patterns: ask a second AI prompt to review the first AI's output.

Review the following Playwright test for correctness and flakiness risk.

Apply these specific checks:
1. Selector strategy: flag any locators that are not data-testid, ARIA role, label, or placeholder
2. Async handling: flag any missing awaits, hard sleeps, or missing waitForResponse/waitForSelector
3. Assertion gaps: identify steps that have no corresponding assertion
4. State pollution: check that test data is created per-test and cleaned up after
5. Hardcoded values: flag URLs, credentials, or environment-specific strings

For each issue found, provide:
- Line number (approximate)
- Issue category
- Specific fix with code

[PASTE GENERATED TEST HERE]

Scoring AI-Generated Tests

A useful internal rubric for deciding if an AI-generated test is merge-ready:

Category	Score 3	Score 2	Score 1
Selectors	All resilient	<20% fragile	>20% fragile
Assertions	Every action asserted	Key actions asserted	Sparse or missing
State mgmt	Full isolation	Partial isolation	Shared/polluting state
Env hygiene	No hardcodes	Minor issues	Credentials/URLs hardcoded

A score of 10-12 is merge-ready. Score 7-9 needs targeted fixes. Score below 7 should be regenerated with a more detailed prompt.

Learning Tip: Build the AI test review checklist into your actual PR review template. When a PR adds AI-generated tests, require the author to check off each item before requesting review. This makes the review process explicit and prevents "AI wrote it, I trust it" from becoming a shortcut that bypasses quality gates. The checklist takes 5 minutes; the flaky test it prevents might cost 5 hours of investigation.