Providing the Right QA Context | AgenticSkillset.org

What Context Should You Include for Test Planning vs. Test Generation vs. Bug Analysis?

Context is not a monolith — the information that makes a test generation prompt work is substantially different from the information that makes a bug analysis prompt work. Providing the wrong context type, or providing all context types for every task, wastes your token budget and can actually confuse the model by flooding it with irrelevant information.

The Three Context Profiles

Test Planning Context

Test planning is a strategic task. You're deciding what to test, in what order, with what priority, and with what risk assessment. The model needs high-level understanding of the system and the business goals — not line-by-line code.

What matters for test planning:
| Context type | Purpose | Priority |
|---|---|---|
| Product requirements / feature spec | Define what the system is supposed to do | Critical |
| User stories with acceptance criteria | Define the specific behavioral contracts for this sprint | Critical |
| Risk areas from architecture or product stakeholders | Inform risk-based prioritization | High |
| Current test coverage summary | Identify existing coverage to avoid duplication | High |
| Release timeline and scope | Determine time-box constraints | Medium |
| Historical defect data for this module | Identify historically risky areas | Medium |
| Source code | Usually NOT needed for planning | Low |

For test planning, omit source code unless you're trying to understand the existing implementation well enough to assess technical risk.

Test Generation Context

Test generation is an execution task. You're producing concrete, specific test cases. The model needs to know the expected behavior at a granular level and the exact shape of the system it's testing against.

What matters for test generation:
| Context type | Purpose | Priority |
|---|---|---|
| Specific acceptance criteria | Define pass/fail conditions precisely | Critical |
| API schema / endpoint spec | Define exact inputs and outputs | Critical |
| Data model / entity definitions | Define valid data ranges, constraints, types | Critical |
| Example test file (for style/framework) | Ensure output matches your team's conventions | High |
| Error handling spec | Define behavior for negative paths | High |
| Business rules and domain logic | Prevent assertions based on wrong assumptions | High |
| Full feature spec | Useful only if AC is embedded in spec | Medium |

For test generation, source code is optional but valuable — it reveals edge cases in the implementation that aren't in the spec.

Bug Analysis Context

Bug analysis is a forensic task. You're working backward from evidence to cause. The model needs precise, complete failure artifacts and system behavior expectations.

What matters for bug analysis:
| Context type | Purpose | Priority |
|---|---|---|
| Exact failure output / stack trace | Primary evidence | Critical |
| Test code that produced the failure | Secondary evidence — reveals what was asserted | Critical |
| Expected behavior from spec | Defines what "wrong" means | Critical |
| Source code of the failing function | Reveals implementation details behind the failure | High |
| System/application logs around failure time | Secondary evidence for timing and state | High |
| CI environment config | Helpful for environment-specific failures | Medium |
| Historical failures in same area | Pattern context for recurring issues | Low |

For bug analysis, never paraphrase the failure — paste the exact text. Paraphrasing loses detail that the model would have used.

Common Context Mixing Errors

Error	Effect	Fix
Providing full source code for test planning	Token waste, model focuses on implementation details instead of behavior	Use architecture summaries, not code
Providing only a story title for test generation	Generic output with hallucinated specifics	Include full AC, data model, API schema
Paraphrasing a stack trace for bug analysis	Loses critical detail in the error message and line numbers	Paste exact trace text
Providing previous sprint's tests for current sprint planning	Misleads model about current coverage	Ensure context matches current state

Learning Tip: Before assembling context for any QA task, write down the task type at the top of your prompt draft: "PLANNING / GENERATION / ANALYSIS." This simple label primes you to select the right context profile. Once this becomes habit, you'll stop the most expensive context mistakes — primarily the tendency to paste everything and let the model figure it out.

How Do You Include Requirements and User Stories as AI Context Effectively?

Requirements and user stories are the foundational context for almost all QA tasks. But not all requirement formats work equally well as AI context, and the way you structure them dramatically affects the quality of what the model produces.

What Makes Requirements Work as AI Context

The model uses requirements to:
1. Understand what "correct behavior" looks like
2. Identify what conditions should produce what outcomes
3. Distinguish expected behavior from bugs
4. Prioritize which scenarios are most important to test

For requirements to serve all four functions, they need to be:
- Specific: Named actions with named states and named outcomes
- Bounded: Clear about what's in scope and what isn't
- Verifiable: Phrased in terms that translate to observable system behavior
- Complete: Covering the important error states, not just the happy path

Formatting Requirements for AI Context

Poorly formatted requirements produce poor AI output even when the content is correct. Use structured formats that make relationships clear.

Weak format (prose narrative):

The user login process allows existing users to log in using their email address and
password. When they enter the wrong password, they should see an error. After too many
failed attempts they get locked out. Users can also reset their password.

Strong format (structured AC):

Feature: User Authentication

Acceptance Criteria:
AC-1: Users can log in with a valid email/password combination
  - Valid credentials → 200 OK + JWT token in response body + Set-Cookie header
  - Response time: < 500ms under normal load

AC-2: Failed login handling
  - Invalid password → 401 Unauthorized + {"error": "Invalid credentials"}
  - Non-existent email → 401 Unauthorized + {"error": "Invalid credentials"} (same error to prevent email enumeration)
  - Locked account → 403 Forbidden + {"error": "Account locked", "unlock_at": ISO8601 timestamp}

AC-3: Account lockout
  - 5 consecutive failed attempts → account locked for 30 minutes
  - Lockout counter resets on successful login
  - Lockout state persists across server restarts (stored in DB, not in-memory)

AC-4: Password reset
  - POST /auth/reset-request with valid email → 200 OK + reset email sent
  - POST /auth/reset-request with unregistered email → 200 OK (no indication whether email exists)
  - Reset link expires after 1 hour
  - Reset link is single-use (invalidated after first click)

The structured format enables the model to:
- Map each test case to a specific AC ID
- Know the exact expected HTTP status codes and response shapes
- Identify the security design decisions (email enumeration prevention)
- Understand the stateful behavior (persistent lockout)

Extracting AC from User Stories for AI Context

Many teams write user stories in a loose format. Before using a story as AI context, convert it to structured AC format. You can use AI to help with this conversion:

**Prompt:**
Convert this user story to structured acceptance criteria format. For each criterion,
specify:
- The trigger condition (what the user does or what state exists)
- The expected system response (HTTP status + response shape for API, UI behavior for frontend)
- Any constraints or business rules that apply

User story:
[paste story here]

If the story is ambiguous or incomplete, flag the gaps rather than inventing answers.

Then review the structured AC before using it in test generation prompts. The review step catches gaps and ambiguities that would produce poor tests if left unaddressed.

Handling Ambiguous or Incomplete Requirements

The worst thing you can do with ambiguous requirements is paste them as-is and hope the model fills the gaps correctly. The model will fill gaps — but with plausible-sounding invention, not with your product's intended behavior.

Better approach — surface ambiguities before generation:

**Prompt:**
Review these acceptance criteria for the checkout flow. Before generating test cases,
identify any ambiguities or missing information that would prevent you from writing
precise, correct test assertions. List each ambiguity as a question.

[paste AC]

Review the model's questions. Answer the ones you know. Escalate the ones you don't to the product owner or developer. Then run the generation prompt with the clarified requirements.

This "requirements review" step, done in 5 minutes before test generation, prevents hours of rework from tests built on incorrect assumptions.

Learning Tip: Paste your team's user story template into your role frame as a system prompt for your test generation sessions. This trains the model to expect your team's specific format and extract AC correctly, even if stories vary in structure. Over time, your AI-assisted test generation becomes resilient to story quality variation — the model knows what information to look for and what to flag when it's missing.

How to Use Code Snippets as Context Without Overwhelming the Model?

Code is the most token-dense context type you'll work with. A single source file can consume 500–5,000 tokens. Used correctly, code context dramatically improves test generation quality by revealing implementation details the spec doesn't capture. Used carelessly, it drowns the model in irrelevant logic and wastes your context budget.

Selecting the Right Code to Include

Not all code is equally valuable as context. Prioritize in this order:

1. The function/method directly under test
For unit test generation: paste the function signature, its parameters and return type, and its body. Typically 10–50 lines. High value.

2. The API endpoint handler
For API test generation: paste the route definition, validation logic, and the controller/handler function. Typically 30–100 lines. High value.

3. The data model / schema definition
Types, interfaces, DB schema for the entity under test. Shows field names, types, constraints, and relationships. Typically 20–60 lines. High value.

4. Error handling code
The try/catch blocks, error response builders, or HTTP status code mappings. Reveals what error shapes are actually returned. Typically 20–40 lines. High value for negative path generation.

5. Business logic in service layer
The core processing logic. Reveals edge cases and state transitions the spec may not describe explicitly. 50–200 lines. Medium value.

6. Infrastructure and config code
Database connection setup, middleware config, logging boilerplate. Very low value for test generation — exclude unless specifically testing infrastructure behavior.

7. Full files "for completeness"
Avoid. Include what you need, not everything adjacent to it.

Code Snippet Best Practices

Always include file path and function name as a header:

**Prompt:**
// File: src/services/payment.service.ts
// Function: PaymentService.processCharge()
[paste function here]

This anchors the model — it knows exactly what this code is and can reference it precisely in its output.

Include type definitions alongside implementation:

**Prompt:**
// File: src/types/payment.types.ts
interface ChargeRequest {
  amount: number;          // in cents, minimum 50 (Stripe minimum)
  currency: string;        // ISO 4217 three-letter code
  customer_id: string;     // Stripe customer ID, format: cus_xxxxx
  idempotency_key: string; // UUID v4, must be unique per charge attempt
}

interface ChargeResult {
  charge_id: string;       // Stripe charge ID, format: ch_xxxxx
  status: "succeeded" | "pending" | "failed";
  failure_reason?: string; // only present when status === "failed"
}

Type definitions are compact (low token cost) and high-value — they tell the model exactly what valid and invalid inputs look like.

Annotate code when context would otherwise be unclear:

**Prompt:**
// NOTE: This function is called by the webhook handler ONLY (not by the API route handler)
// NOTE: amount is in CENTS, not dollars (1000 = $10.00)
// NOTE: retry logic is handled by the queue — this function should throw on failure,
//       not swallow errors
function processStripeWebhookCharge(payload: StripeWebhookPayload): Promise<void> {
  [paste function body]
}

Brief comments provide context that the code alone doesn't express — particularly about caller expectations, unit conventions, and invariants.

Providing Code for Different Test Scopes

For unit tests — include only the function under test and its direct dependencies (types, constants, error classes):

**Prompt:**
Generate unit tests for the validatePasswordStrength() function. 

// File: src/validators/password.validator.ts
[paste function — typically 20–50 lines]

// Relevant types:
[paste PasswordValidationResult interface]

// Test framework: Jest with TypeScript
// Use: describe/it structure, beforeEach for shared setup
// Mock nothing — this function has no external dependencies

For integration tests — include the API handler, the data model, and relevant service layer:

**Prompt:**
Generate integration test scenarios (not code) for the order creation endpoint.

// Endpoint: POST /orders
// Handler: src/controllers/orders.controller.ts — createOrder()
[paste handler — typically 30–80 lines]

// Order model: src/models/order.model.ts
[paste model — typically 20–40 lines]

// Note: This endpoint calls PaymentService.initiate() — mock this in tests

For E2E tests — code context is less important; user flow and UI behavior matter more. Include the page object or component props definition if available:

**Prompt:**
Generate Playwright E2E test scenarios for the checkout flow.

// Component: CheckoutForm
// Props: src/components/CheckoutForm/CheckoutForm.tsx (first 30 lines — props only)
[paste component interface/props definition]

// Route: /checkout (authenticated users only)
// Flow: shipping address → payment method → order review → confirm

Learning Tip: Build a personal "code excerpt kit" script or IDE shortcut that extracts just the function under test, its type signatures, and its imports — without the surrounding file. This is the precise context slice that maximizes value-per-token. Running this script before test generation sessions makes code context assembly a 10-second operation instead of a manual copy-paste judgment call.

How to Use Existing Test Cases to Guide AI Generation of New Ones?

Existing test cases are one of the most powerful and most underused forms of AI context. They serve three functions simultaneously: they demonstrate your team's test style, they show the frameworks and conventions in use, and they reveal the patterns already covered so the model can extend rather than duplicate.

Why Style Reference Context Matters

LLMs are trained on test code from thousands of projects using hundreds of different styles, frameworks, and conventions. Without a style reference, the model defaults to a generic average that may match no specific team's conventions.

With a style reference, the model produces test code that:
- Uses your exact describe/it/test/context nesting structure
- Follows your team's naming conventions (e.g., should_doX_when_Y vs. it("does X when Y"))
- Uses your preferred assertion style (Jest matchers, Chai expect, Assert.equal)
- Uses your team's test helpers, fixtures, and factory patterns
- Includes the same level of test data setup you use
- Uses your team's mock/stub patterns correctly

Selecting the Right Example Test File

Not all existing tests make good examples. Select a file that:
- Tests something similar in nature to what you're about to generate (API test example for API generation, UI test example for UI generation)
- Is well-written (your best test file, not a deprecated one from two years ago)
- Is a representative length — 50–150 lines is ideal. Too short doesn't show enough patterns; too long wastes tokens
- Is recent enough to reflect current conventions

How to Frame the Example in Your Prompt

Don't just paste the example file — frame it explicitly:

**Prompt:**
[task context and spec here]

Use the following existing test file as a style reference. Match its structure,
naming conventions, assertion patterns, and test data setup approach. Do NOT duplicate
the test cases in this example — generate new scenarios not already covered.

// STYLE REFERENCE: src/tests/api/auth.test.ts
[paste the example test file]

Generate new test cases for [new feature].

The explicit framing ("style reference", "do NOT duplicate") ensures the model uses the example as a template, not as content to build on.

Using Existing Tests to Identify Coverage Gaps

Existing tests also serve as a coverage map. You can use them directly to drive gap analysis:

**Prompt:**
You are a senior QA engineer reviewing test coverage.

Here are the existing tests for the UserAuthService:
[paste existing test file]

Here are the acceptance criteria for the UserAuthService:
[paste AC]

Step 1: List every acceptance criterion.
Step 2: For each AC, identify which test case(s) cover it (by test name).
Step 3: For each AC with no coverage, create a new test case scenario (not code).

Format the gap analysis as a table: AC ID | AC Description | Covering Tests | Gap?
Then list the new scenarios at the end.

This produces a coverage matrix and a gap-filling test plan in one operation — work that would take a human 30–60 minutes of manual tracing.

Building a Test Archive for AI Context

For mature projects, maintain a curated "AI context archive" — a set of representative test files specifically selected for use as AI context:

tests/
  _ai-context/
    example-api-test.ts          # Best example of API test style
    example-unit-test.ts         # Best example of unit test style
    example-e2e-test.spec.ts     # Best example of Playwright E2E style
    example-integration-test.ts  # Best example of integration test style

These files are maintained alongside your test suite. When they become outdated (framework upgrade, convention change), update them immediately. Think of them as living style guides for AI-assisted test generation.

When you run a test generation session, you paste the appropriate archive file as style context. No judgment call required — you always know which file to use.

Preventing Style Drift Over Time

If every QA engineer on your team uses different example files, AI-generated test code will vary in style from person to person. Standardize the archive through a team convention:

Store it in a documented location (tests/_ai-context/ or similar)
Update it as a team during framework upgrades
Reference it in your team's QA contribution guide

This makes AI-generated test code stylistically consistent regardless of who runs the generation session.

Learning Tip: The next time you write a particularly clean, well-structured test file, immediately add it to your AI context archive. Good test code is rare — when it appears, capture it. A curated archive of five well-chosen example files produces consistently higher-quality AI test generation than an ad-hoc approach where you search for "a recent test file" before every session.