The Dual-Context Model for Testing

What Are the Two Types of Context Every QA Engineer Needs for Agentic Testing?

The single most consequential insight in context engineering for QA is this: great test generation and regression analysis require two fundamentally different types of context, not one. Each type answers a different question. Missing either one produces systematically incomplete output. Understanding both — and how they interact — is the foundation of effective AI-assisted QA.

The Two Context Types

Spec Context answers the question: What should the system do?

Spec context is the collection of requirements, user stories, acceptance criteria, API contracts, and behavioral specifications that define the intended behavior of the system. It is the expected behavior target — the source of truth that tells the AI what "correct" means.

Without spec context, the model has no way to know what to test for. It can describe a system's observed behavior but cannot determine whether that behavior is correct.

Code-Change Context answers the question: What was actually changed?

Code-change context is the collection of diffs, commit messages, changelogs, and changed file summaries that define what is different about the system in this version. It is the scope anchor — the information that tells the AI where the risk of regression actually lies.

Without code-change context, the model cannot scope its work. It either generates too broadly (covering the entire system) or depends entirely on you to tell it what to focus on — which is exactly the judgment you wanted AI to help with.

The Dual-Context Model Visualized

                    SPEC CONTEXT                 CODE-CHANGE CONTEXT
                 (What should happen)             (What actually changed)
                         │                                 │
                         ▼                                 ▼
              ┌─────────────────────────────────────────────────┐
              │                   AI ENGINE                      │
              │                                                  │
              │  "Does the change break any expected behavior?" │
              │  "What tests should be added for new behavior?" │
              │  "What existing tests are now at risk?"         │
              └─────────────────────────────────────────────────┘
                         │
                         ▼
              ┌─────────────────────────┐
              │  Test Plan / Gap Report │
              │  Risk Matrix            │
              │  Regression Scope       │
              └─────────────────────────┘

The intersection of these two context types is where the highest-value QA analysis lives. Each type alone produces limited output. Together, they enable the AI to reason about whether the change is likely to break expected behavior — which is the fundamental question of regression testing.

Why Both Types Are Necessary

Consider a PR that modifies the payment processing timeout behavior:

Spec context alone: The AI knows what behavior should exist (charge should succeed within 5 seconds, return 402 if card declined). It can generate test scenarios for the payment flow. But it doesn't know what changed, so it generates tests for the entire payment system — a broad, unfocused output.
Code-change context alone: The AI knows the timeout was changed from 1,000ms to 3,000ms and that retry logic was added. It can identify that timeout behavior is affected. But without the spec, it doesn't know whether 3,000ms is within the acceptable window per the contract, or whether the retry count is bounded by a business rule.
Both types together: The AI knows what changed (timeout + retry logic) AND what the spec requires (charge must complete within a defined SLA, partial charges must be idempotent). It can now identify the specific test scenarios that verify the changed behavior against the contracted requirements — focused, precise, and risk-calibrated.

Learning Tip: Before starting any AI-assisted testing session, ask yourself: "Which context type am I missing?" Spec context without code-change context → your output will be too broad. Code-change context without spec context → your output will lack pass/fail criteria. Both present → you're ready for high-quality analysis.

How Does Spec and Requirements Context Define the Expected Behavior Target?

Spec context is your QA work's foundation. Without it, all test assertions are guesses. With it, assertions become verifiable contracts. This section covers how to structure spec context so it functions as a precise expected behavior target — not just background reading for the model.

What Counts as Spec Context

Spec context is any artifact that defines intended behavior at a level that can be verified:

Artifact	What it specifies	Best format for AI context
User stories with acceptance criteria	Feature-level behavior contracts	Structured AC format (see Module 2, Topic 4)
OpenAPI / Swagger specs	API endpoint contracts (inputs, outputs, status codes)	YAML or JSON — paste the relevant paths section
JSON Schema definitions	Data shape and validation rules	Paste the schema directly
Product requirement documents	Business rules and functional requirements	Extract key rules as a bullet list
JIRA tickets with Gherkin	BDD scenarios	Paste Given/When/Then blocks
Architecture decision records (ADRs)	System behavior decisions and their rationale	Paste the relevant ADR section
Runbooks / operational specs	How the system should behave under operational conditions	Paste the relevant section

Making Spec Context Machine-Readable

The challenge with real-world spec documents is that they're written for humans — rich with prose, context, and nuance — but need to function as machine-readable truth tables for AI analysis.

Transformation technique — extract verifiable statements:

Before using a spec in a prompt, convert narrative requirements into verifiable statements. Each statement should be in the form: "[Given condition] → [expected outcome that can be observed]."

Raw spec excerpt:

The system should handle concurrent orders gracefully. Multiple users placing orders at the
same time should not result in inventory going negative, and all orders should either
succeed or fail cleanly.

Transformed for AI context:

Concurrency requirements:
- Given: N users concurrently place orders for the same product with inventory = 1
  Expected: exactly 1 order succeeds, all others receive 409 Conflict with {"error": "insufficient_inventory"}
- Given: any order fails mid-processing (after payment, before inventory decrement)
  Expected: payment is refunded within 5 minutes, order status = FAILED, no inventory consumed
- Given: database write fails during order creation
  Expected: entire transaction rolls back, no partial order state persists

The transformed version is directly usable as test acceptance criteria. Each statement maps to one or more test cases.

Versioning Spec Context

A common and dangerous error is using an outdated spec as context. The model will generate tests for requirements that no longer apply, and miss tests for current requirements.

Always include a version indicator in your spec context:

**Prompt:**
SPEC CONTEXT (Version: sprint-34, last updated 2024-01-22):
[spec content]

If the spec has changed recently, note the delta:

**Prompt:**
SPEC CONTEXT (current version):
[spec content]

RECENT SPEC CHANGES (from sprint-33 to sprint-34):
- AC-4 timeout changed from 1s to 3s
- AC-7 (admin override) removed from scope
- AC-12 (new): bulk order API endpoint added with limits defined below

This version-aware framing prevents the model from referencing stale requirements.

Using OpenAPI Specs as Precision Context

For API QA, OpenAPI specs are the gold standard of spec context — they are already machine-structured and contain everything needed for precise test generation.

**Prompt:**
Generate integration test scenarios for the following API endpoint. Use the OpenAPI
spec as the authoritative source for all expected inputs, outputs, and status codes.

OpenAPI spec (relevant section):
```yaml
/orders:
  post:
    summary: Create a new order
    requestBody:
      required: true
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/OrderRequest'
    responses:
      '201':
        description: Order created
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/Order'
      '400':
        description: Validation error
      '409':
        description: Insufficient inventory
      '422':
        description: Payment processing failed

components:
  schemas:
    OrderRequest:
      type: object
      required: [product_id, quantity, payment_method_id]
      properties:
        product_id:
          type: string
          format: uuid
        quantity:
          type: integer
          minimum: 1
          maximum: 100
        payment_method_id:
          type: string


OpenAPI spec pasted directly gives the model precise input validation rules, required fields, and status codes — producing test cases with correct assertions instead of guessed ones.

> **Learning Tip**: If your team doesn't have a formal spec document but does have acceptance criteria in Jira, build a "spec extraction" step into your sprint workflow: at the start of each sprint, run a prompt that converts all sprint stories to structured AC format and saves the output as a sprint spec file. This file becomes the spec context you reference for all test generation that sprint. Five minutes of sprint startup saves you from assembling context from scratch for every generation session.

---

## How to Use Feature Branch Diffs as Context to Scope Risk and Focus Testing?

The PR diff is the most precise code-change context available. It shows exactly what changed, nothing more and nothing less. Using it as AI context lets you scope your regression analysis and test generation to the actual risk surface — rather than guessing or over-testing the entire system.

### What a Diff Tells the AI

A feature branch diff reveals:
- Which functions and methods were **added** (new behavior to test)
- Which functions and methods were **modified** (existing behavior that may have changed)
- Which functions and methods were **deleted** (paths that no longer exist — tests covering them now test dead code)
- Which call sites were updated (consumers that now behave differently)
- Which dependencies were added or upgraded (third-party behavior changes)
- Which configuration values changed (thresholds, timeouts, feature flags)

The model can reason about each of these change types and produce targeted test scenarios.

### Getting the Right Diff Format

For AI analysis, `git diff` output is parseable but verbose. For larger PRs, consider providing a structured summary alongside or instead of the raw diff.

**For small PRs (< 200 lines of diff):** Paste the raw diff directly.

Prompt:
Analyze this PR diff for testing scope.
[paste git diff output]


**For larger PRs:** Pre-process into a structured change summary:

```bash
git diff main...feature-branch --stat
git diff main...feature-branch -- "*.ts" "*.js" | grep "^[+-]" | grep -v "^[+-][+-][+-]" | head -200

Or ask the AI to summarize the diff first, then use that summary for analysis:

**Prompt:**
STEP 1: Summarize the behavioral changes in this diff. For each modified function,
describe in one sentence what changed behaviorally (not what lines changed — what
behavior is different). Group by module.

[paste diff]

After summarizing, I will ask you to generate test scenarios for the highest-risk changes.

Risk-Scoring Changes from a Diff

Not all changes carry equal regression risk. Use the model to risk-score the diff:

**Prompt:**
You are a senior QA engineer performing risk-based analysis of this PR.

For each modified function or method in the diff below, assign a regression risk score:
- HIGH: Change could break existing user-facing behavior, payment flows, auth, data integrity
- MEDIUM: Change affects internal behavior but consumers are likely resilient
- LOW: Change is cosmetic, logging, refactoring with same observable behavior

For each HIGH-risk change, also identify which callers or test cases are at risk.

PR diff:
[paste diff]

This produces a prioritized list of what to test — the foundation of a risk-based regression plan.

Using Diff Context for Targeted Regression Scoping

Once you know which functions changed, you can scope your regression test run:

**Prompt:**
Given the following changed functions in this PR:
- PaymentService.processCharge() — timeout logic changed
- OrderService.create() — calls PaymentService with new retry parameter
- OrderController.createOrder() — new validation for bulk orders

Review the test suite summary below and identify:
1. Which existing tests directly test these functions?
2. Which existing tests indirectly depend on these functions (integration tests that call them)?
3. Which tests should definitely run as part of this PR's CI check?
4. Are there any changed behaviors with no test coverage?

Test suite summary:
[paste test summary or relevant test file names and their descriptions]

Diff Analysis for Different Change Types

Change type	Test generation focus	Key risk
New API endpoint	Full new endpoint test suite	Missing input validation, error handling
Modified validation logic	Boundary conditions, existing valid inputs	Regression on previously valid inputs now rejected
Changed error handling	Error path scenarios, error message format	Error responses may break client error handling
Database schema change	Data migration tests, nullability, default values	Existing data breaks under new schema
Third-party library upgrade	Integration tests with the library, behavior changes	Library behavior changed between versions
Feature flag added	Both flag-on and flag-off test paths	Divergent behavior in production vs. test
Configuration change	Tests with new and old config values	Config-dependent edge cases
Refactoring (no behavior change)	Run full existing test suite, no new scenarios needed	Hidden behavior changes assumed to be safe

Learning Tip: Make diff-based testing your default approach for every PR review. Before you look at the PR description, run: "What changed (diff) + what should it do (spec) → what should I test?" This framing takes 5 minutes and consistently identifies the regression scenarios that matter. Teams that adopt this practice stop discovering critical regressions in production and start catching them in review.

How Do Spec Context and Code-Change Context Work Together to Detect Gaps?

The dual-context model's highest value is gap detection — finding the places where the intended behavior (spec) diverges from the tested behavior (code change). This is the type of analysis that takes a human hours and an AI minutes.

Gap Types That Dual-Context Detects

Gap Type 1 — New code with no spec
A developer added functionality that has no corresponding acceptance criterion. This could be an undocumented feature, a workaround, or a misunderstanding of requirements.

Detection prompt:

**Prompt:**
Compare the new functions added in this PR diff against the acceptance criteria.

New functions in PR:
[list or paste new code]

Acceptance criteria:
[paste AC]

Identify: any new function that cannot be traced to an acceptance criterion. Flag these
as "unspecified behavior" — they may be bugs, undocumented features, or scope creep.

Gap Type 2 — Spec with no code
An acceptance criterion exists but no code in the diff implements it. The feature may be incomplete.

Detection prompt:

**Prompt:**
Map each acceptance criterion to the code change that implements it.

AC:
[paste acceptance criteria]

PR diff summary:
[paste change summary]

For each AC: either identify the implementing code change, or flag it as "unimplemented
— no corresponding code change found."

Gap Type 3 — Changed code with outdated tests
Code behavior changed in the diff, but the existing tests weren't updated to match. The tests may now pass for the wrong reasons, or may be testing assertions that are no longer valid.

Detection prompt:

**Prompt:**
This PR modified the following functions: [list]

Here are the test cases that currently test those functions:
[paste relevant existing tests]

For each test: does the existing assertion still match the new behavior?
If not, what would the correct assertion be?
Flag any test that should be updated.

Gap Type 4 — Spec violation
The code change implements behavior that directly contradicts the spec. The most critical gap type — it means the feature was built incorrectly.

Detection prompt:

**Prompt:**
Review this code change against the spec. Identify any place where the implementation
contradicts the specification.

Spec (expected behavior):
[paste spec]

Code change (actual implementation):
[paste diff or changed function]

List each contradiction as: "Spec says X. Code does Y."

The Full Dual-Context Analysis Prompt

For a complete gap analysis on a PR, combine both context types in a single structured prompt:

**Prompt:**
You are a senior QA engineer performing a comprehensive gap analysis on a PR before
test planning.

SPEC CONTEXT (expected behavior):
[paste acceptance criteria or spec excerpt]

CODE-CHANGE CONTEXT (what changed):
[paste PR diff or change summary]

Perform a four-part analysis:

PART 1 — COVERAGE MAP: For each acceptance criterion, identify the code that implements it.
PART 2 — UNSPECIFIED CODE: Identify any new/changed code with no corresponding AC.
PART 3 — UNIMPLEMENTED SPEC: Identify any AC with no corresponding code change.
PART 4 — TEST PRIORITIES: Based on Parts 1–3, rank the test scenarios in priority order.

Format as four separate sections.

This prompt produces a QA analysis document that is substantially more rigorous than a manually assembled test plan — and it's based on your actual code and spec, not generic best practices.

Learning Tip: Run this dual-context analysis at the start of every sprint, on the PRs most relevant to your test scope. The output — a coverage map, a list of unspecified behavior, a list of unimplemented spec, and a prioritized test list — is your sprint test plan. Use it as the basis for all test generation in that sprint. Teams that adopt this practice ship with measurably fewer spec-implementation mismatches.

What to Do When Only One Type of Context Is Available?

Perfect context is a luxury. Real QA work often starts with only spec context (for new features not yet coded) or only code-change context (for hotfixes where the spec wasn't updated). Knowing how to work with partial context — and how to compensate for what's missing — is what separates experienced practitioners from beginners.

When You Have Only Spec Context (No Code Yet)

This is the test-first scenario: requirements are written but the feature isn't built. This is actually a strong position for QA because you can build your test plan before implementation begins — which is the ideal but rarely practiced workflow.

What you can do with spec context alone:

Generate complete test scenario tables (behavior-based, not implementation-based)
Write Gherkin feature files that document expected behavior
Design test data sets that cover equivalence classes and boundary conditions
Create an acceptance test checklist that will gate the feature
Identify ambiguities and gaps in the spec before they become bugs

What you should not attempt without code context:

Specific assertion values that depend on implementation (exact field names, error message text)
Integration scope (which services are involved, which endpoints are affected)
Performance or load test parameters (without implementation, thresholds are guesses)

Compensation technique — generate behavior-level tests with placeholder assertions:

**Prompt:**
Generate test scenarios for this spec. Since the implementation is not yet available,
focus on behavior-level scenarios (what to test, not how to assert it). Where specific
assertion values (field names, exact error messages) are needed, use [PLACEHOLDER] and
note what information is needed to complete the assertion.

This produces a test plan that's reviewable now and completable when code is available.

When You Have Only Code-Change Context (No Spec)

This is the hotfix or legacy system scenario: code changed but requirements documentation is absent, outdated, or nonexistent. This is a riskier position — you're deriving expected behavior from implementation, which inverts the proper quality relationship.

What you can do with code-change context alone:

Identify what behaviors changed and what new conditions exist
Generate scenarios that test the changed behavior (whether or not it's correct)
Identify the scope of regression risk for existing tests
Create an observable behavior baseline that can later be validated against business intent

What you should explicitly caveat:

When generating tests from code-change context without spec, add this framing to your output:

**Prompt:**
Important: Generate test scenarios based on the observed behavior of the code change.
Flag each scenario with: "BEHAVIOR-DERIVED — verify against business requirements before
accepting as correct."

This prevents behavior-derived tests from being treated as spec-validated tests.

Compensation technique — derive spec from code and validate with stakeholders:

**Prompt:**
Based on this code change, describe the expected behavior as if you were writing
acceptance criteria. I will validate these derived requirements with the product owner.

Code change:
[paste diff or changed function]

Format as structured acceptance criteria (AC-1, AC-2, ...).

Share the output with your product owner or developer for validation before building your test plan on it. This creates a spec from code review — better than nothing, but must be validated.

When Both Context Types Are Incomplete

Sometimes your spec is partial (covers happy paths, not errors) and your diff is large and touches many systems. Work incrementally:

Start with the highest-confidence AC and the most constrained diff sections
Generate tests for the parts you understand well
Explicitly mark scenarios for areas where context is incomplete: "NEEDS VERIFICATION — spec ambiguous on this behavior"
Use the AI to identify what additional context you need: "What information is missing from this prompt that would let you generate more precise test cases?"

The last technique is particularly powerful — the model will tell you exactly what it needed but didn't have.

Learning Tip: Develop a "context completeness" checklist you run before every AI-assisted test session: "Do I have spec context? Do I have code-change context? For each one that's missing, what's my compensation strategy?" This 30-second check prevents you from spending 20 minutes generating tests from incomplete context, only to discover the output is too vague to use. Incomplete context is not a blocker — but it requires deliberate compensation, not just optimism.