Risk-Based Exploration with AI Assistance

How to Use Code Diffs and Change Logs to Focus Exploratory Sessions?

Every code change creates a blast radius — a set of behaviors that could have been affected. Experienced QA engineers develop intuition for reading diffs and identifying where to look. AI accelerates and systematizes this intuition, turning a 30-minute manual risk analysis into a 5-minute AI-assisted process.

Why Code Diffs Are the Highest-Value Input for Exploration Scoping

A feature spec describes intended behavior. A code diff describes what actually changed. These are not the same thing. Bugs live in the gap between intention and implementation — and the diff tells you exactly where that gap could exist.

Reading a diff with a risk lens is a distinct skill from reading it for code review. Code review asks: "Is this correct?" Risk-based diff reading asks: "Where could this go wrong in ways the author didn't anticipate?"

For exploratory testing, you want to identify:

Changed code paths: Which functions, components, or API handlers were modified
New dependencies: External services, libraries, or internal modules newly imported
Modified data models: Schema changes, new fields, changed field types
Changed validation logic: New rules added, old rules removed, conditions changed
Boundary changes: New limits, timeouts, thresholds, or flags
Deleted code: Removed logic that may have been covering an edge case
Configuration changes: Feature flags, environment variables, service endpoints

Prompt Pattern: Diff-to-Risk Analysis

You are a QA engineer performing risk analysis on a code change.

PR description:
[paste PR description]

Code diff (or diff summary):
[paste diff or describe the key changes]

Tasks:
1. Identify the code paths changed and describe each in plain language
2. For each changed path, identify 2–3 specific failure modes that exploratory
   testing should check
3. Identify any new external dependencies introduced and the risks they bring
4. List any deleted or modified validation logic and what edge cases it might affect
5. Rank the changed areas by risk: High | Medium | Low
   (Risk = likelihood of failure × user impact if it fails)

Output format:
For each risk area:
- Area: [component/function name or description]
- Change: [what was changed]
- Risk: [High/Medium/Low] — [1-sentence justification]
- Exploration angles: [2–3 specific things to test]

Working with Real Diff Output

When you have access to the actual git diff, paste it directly. For large diffs, focus the AI on the highest-risk files:

Here is the git diff for this PR. Focus your risk analysis on files that:
- Handle user input or data validation
- Interact with external APIs or services
- Manage authentication or session state
- Touch payment, billing, or personal data

Diff:
[paste git diff output]

For each high-risk file change you identify, list specific exploration angles.

For teams where QA doesn't have direct repo access, the PR description and developer change summary are sufficient. The key information is what changed and why — not the raw code.

Reading Change Logs for Cumulative Risk

Individual PRs show discrete changes. Change logs show cumulative risk — multiple small changes to the same area that individually seem low-risk but together suggest an unstable component:

Here is our change log for the past 2 weeks in the checkout feature area:

[paste changelog entries — commit messages, PR titles, or Jira issue list]

Identify:
1. Which feature areas received the most changes? (Frequency of change = instability signal)
2. Are there any areas changed by multiple different developers? (Coordination risk)
3. Are there any areas that were changed, then changed again (potentially rewritten or reverted)?
4. Based on change frequency and type, which areas should be prioritized for exploratory
   exploration this week?

Frequency of change is one of the most reliable predictors of defect density. Files changed frequently, especially by multiple authors, have higher defect rates in nearly every empirical study of software quality. AI can surface this pattern from a change log in seconds.

Translating Diff Analysis into Session Scope

The output of diff analysis is a ranked list of "investigation zones." Map these directly to your charter generation:

Diff analysis produces: "Risk Area 1: New address validation API integration — HIGH risk"
Feed this directly into the charter generation prompt: "Create a focused charter for this risk area"
The charter becomes the session guide

This pipeline — diff → risk analysis → charter → session — is a complete, repeatable workflow for change-driven exploratory testing.

Learning Tip: Build a habit of running the diff-to-risk analysis prompt within 24 hours of any PR landing in your test environment. Don't wait until you've planned the sprint — the diff analysis is the session planning input. Teams that adopt this as a standing practice consistently report finding more defects per exploratory session hour than teams that scope from specs alone.

How Does AI Identify High-Risk Change Areas Worth Exploring First?

Not all code changes are equally risky. Some changes touch core business logic that affects every user. Others are internal refactors with no observable behavior change. AI can reason about risk factors that would take a human analyst significant time to evaluate manually.

The Risk Factors AI Can Evaluate

1. Business impact scope

Changes that touch features used by all users (login, payment, core navigation) have higher business impact than changes to admin screens or settings pages. Providing user volume or feature criticality context helps AI weight this correctly:

The following changes were made this sprint. Context on feature criticality:
- Checkout flow: Used in every purchase transaction (~10,000/day)
- Address book management: Used in checkout and profile (~2,000/day)
- Admin user management: Used by internal staff only (~50/day)

Rank these changes by exploration priority given criticality:
[paste change list]

2. Complexity of the change

Large diffs, changes to deeply nested logic, or changes that touch multiple layers of the stack (UI + API + database) are higher risk than small, isolated changes:

Evaluate complexity risk for the following changes:
[Change 1]: Modified 3 lines in a single validation function — isolated
[Change 2]: Refactored address model affecting 12 files across frontend, API, and database layer
[Change 3]: Added a new feature flag controlling checkout behavior for 20% of users

For each, assess:
- Complexity level: High | Medium | Low
- Testing surface area (what needs to be tested)
- Recommended session duration to cover the risk

3. Change to error handling vs. happy path

Changes to error handling, fallback logic, and edge case behavior are higher risk than changes to core happy paths — because error paths are tested less frequently in both automated and manual testing:

This PR modifies the following:
1. Happy path: Address auto-fill displays suggestions (most common path)
2. Error path: When the suggestion API times out, falls back to regex validation
3. Edge case: When the user is in a US territory, uses a different validation ruleset

Identify which paths are highest risk for exploratory exploration and why.

4. External dependency changes

Any change that introduces, modifies, or removes an external dependency (third-party API, library update, infrastructure change) deserves explicit exploration of the integration boundary:

This PR introduces a new dependency: SmartyStreets address verification API.
Previous behavior: client-side regex validation only.

What are the integration boundary risks that exploratory testing should cover?
Consider: API availability, response format changes, rate limits, authentication,
data format edge cases, latency, error codes, and timeout behavior.

The Risk Scoring Prompt

For teams that want a systematic risk score for prioritization decisions:

Score the following changes for exploratory testing priority.

For each change, assign a score from 1–10 for:
- User impact (how many users are affected if this fails?)
- Failure likelihood (how complex is the change? how well-tested is this area historically?)
- Detection difficulty (how hard would it be to catch this in automated tests?)
- Business criticality (does this touch payment, auth, or core data?)

Composite risk score = average of the four dimensions.
Rank changes by composite score, highest first.

Changes:
[paste change list with descriptions]

Also flag any change where any single dimension scores 8 or above — these need
immediate dedicated exploration regardless of composite score.

Prioritizing When You Have Limited Time

In most sprints, you have more to explore than time allows. Given a ranked list of risk areas, how do you allocate session time?

I have 4 hours of exploratory testing time available this sprint.

Risk-ranked changes:
1. [HIGH] New address validation API integration — estimated 90-min session
2. [HIGH] Checkout flow refactor — estimated 120-min session
3. [MEDIUM] Profile page UI update — estimated 45-min session
4. [LOW] Admin settings label change — estimated 15-min session

Given the time constraint and risk rankings:
1. Recommend a session schedule (what to cover, in what order, for how long)
2. For any high-risk area where time is insufficient for full coverage, what are
   the most important 3 angles to cover in a shortened session?
3. What can be safely deferred to next sprint without significant risk?

Learning Tip: Ask your development team to send you the "three riskiest changes in this PR" before each test cycle. This developer-QA dialogue about risk is extremely valuable — developers know where they were uncertain during implementation, where they deviated from the original design, or where they "made it work but aren't sure why." This knowledge never makes it into specs or diffs, but it's the highest-signal risk information you can get. AI can help you formulate the right questions to ask developers based on the diff.

How to Combine Scripted and Exploratory Testing Based on AI Risk Signals?

Risk signals from code analysis shouldn't only drive exploratory sessions — they should also inform which areas get expanded automated test coverage. The most effective QA operations use AI risk analysis to dynamically balance scripted and exploratory testing across each sprint.

The Complementary Roles of Each Approach

Testing type	Best for	Driven by
Scripted automated	Stable, critical happy paths; regression prevention	Historical test coverage; known failure modes
Exploratory	New changes; edge cases; unknown unknowns; behavior under unexpected conditions	Risk signals from diffs; domain expertise; heuristics

These aren't competing approaches — they're complementary. Exploratory testing discovers what to automate. Automated testing prevents regressions in what was discovered.

The Sprint-Level Workflow

Step 1: Diff-to-risk analysis (Day 1–2 of sprint)

AI analyzes the diff and produces a ranked risk list. This list feeds both session planning and automated test gap analysis.

Step 2: Session planning (Day 1–2)

High-risk areas get exploratory sessions. The charter is generated from the risk analysis output.

Step 3: Coverage gap analysis (Day 1–2, parallel)

Use AI to check: do the high-risk areas identified in step 1 have adequate automated test coverage?

Here is the risk analysis from this sprint's diff:
[paste risk analysis]

Here is a summary of our current automated test coverage for the affected areas:
[paste coverage summary, test file list, or describe coverage level]

For each high-risk area:
1. Is the current automated coverage sufficient to catch the most likely failure modes?
2. What specific test cases should be added to the automated suite?
3. What should remain in exploratory (too complex or too exploratory to automate effectively)?

Recommend a coverage addition plan for the automated suite, prioritized by risk.

Step 4: Post-exploratory automation decision (After sessions)

After exploratory sessions find defects and confirm risk areas, decide what to automate:

Based on these exploratory findings, which scenarios should be added to our automated
regression suite?

Findings: [paste synthesis]

For each finding, assess:
- Automation candidate: Yes | No | Maybe
- If Yes: What type of test is most appropriate? (Unit | Integration | E2E | API)
- If Yes: What is the test priority? (Critical | High | Medium)
- If No: Why is this better kept as an exploratory check?

Output a prioritized automation backlog.

Deciding What Not to Automate

AI risk analysis can also identify areas where exploratory testing is permanently more valuable than automation:

Evaluate whether the following exploratory findings are good automation candidates.
For each, explain why or why not.

Finding 1: The checkout flow breaks when the user has 20+ items in cart (found via
exploratory free-testing, not a scenario we anticipated).

Finding 2: Address field layout breaks on 320px viewport on Firefox (browser-specific,
layout regression).

Finding 3: When user account has both a personal and business profile, the billing
address pre-fill uses the wrong profile's address (complex state scenario).

AI will evaluate each: boundary conditions and layout regressions are good automation candidates; complex multi-state interaction bugs require more judgment about whether the automation investment is proportional to the detection value.

Triggering Exploratory Sessions from Automated Test Failures

Automated test failures are signals for where to explore — not just things to fix:

Our automated test suite has the following new failures after today's deployment:

Failure 1: test_address_validation_accepts_valid_zip — AssertionError: validation rejected
   ZIP code "00979"
Failure 2: test_checkout_rate_limit_error_display — Expected error message not shown
Failure 3: test_profile_billing_address_prefill — Wrong address selected for business account

Based on these failures, generate:
1. An exploratory charter for each failure area — the automated test caught *this* case,
   what else might be broken nearby?
2. The most likely related scenarios that the automated tests didn't check
3. Whether these failures suggest a systemic issue worth a dedicated session

This creates a feedback loop: automated tests catch specific regressions, AI analysis generates exploratory charters for the surrounding risk area, exploratory sessions uncover what automation missed.

Learning Tip: In your team's sprint retrospective, track the following metric: "Of the bugs found this sprint, how many were found by automated tests vs. exploratory sessions vs. production?" If exploratory testing consistently finds a category of bugs that automation misses, that category has a characteristic that makes it hard to automate — and that's valuable information about where to permanently invest exploratory time. Let the data tell you where each approach is more effective, rather than defaulting to "automate everything eventually."

How to Track Which Areas Have Been Explored Over Time?

Exploratory testing without tracking produces invisible coverage. You can't answer "have we explored X?" without documentation. And invisible coverage means that when a production bug appears, you can't determine whether the area was never explored or was explored and looked fine at the time.

The Coverage Map Structure

A coverage map is a structured document that tracks what has been explored, when, under what charter, and what was found. It operates at the feature area level — not the individual test case level.

Coverage map schema:

Feature Area	Sub-Area	Last Explored	Charter Ref	Platform	Status	Key Findings
Address validation	Valid inputs	2024-01-17	CHARTER-003	Web	Covered	None
Address validation	US territories	2024-01-17	CHARTER-003	Web	Defect Found	BUG-1204
Address validation	International	—	—	—	Not Covered	—
Network resilience	Slow connection	2024-01-19	CHARTER-005	iOS	Covered	None
Network resilience	API timeout	2024-01-19	CHARTER-005	iOS	Defect Found	BUG-1212
Network resilience	Offline mode	—	—	—	Not Covered	—

Generating and Updating the Coverage Map with AI

Initial map generation:

Create a coverage map template for the following feature.

Feature: Address auto-fill and management
Known sub-areas to track:
- Functional: add, edit, delete, set-default
- Data: valid inputs, invalid formats, edge cases, US territories, international
- Network: fast, slow, timeout, offline, API down
- Platform: Web (Chrome/Safari/Firefox), iOS, Android
- Security: rate limiting, input sanitization, API response safety
- Accessibility: keyboard navigation, screen reader, color contrast

For each sub-area, create a row in the coverage map with columns:
Feature Area | Sub-Area | Last Explored | Charter Ref | Platform | Status | Key Findings

Initialize all rows with Status: "Not Covered" and empty fields.
Output as a markdown table I can paste into my tracking document.

Post-session update:

Update the following coverage map based on the session summary below.

Current coverage map: [paste map]
Session summary: [paste synthesis]
Session date: 2024-01-19
Charter reference: CHARTER-005

For each area covered in this session:
- Update "Last Explored" to 2024-01-19
- Update "Charter Ref" to CHARTER-005
- Update "Status" to "Covered", "Partially Covered", or "Defect Found" as appropriate
- Add finding references for any defects found

Output the updated map in the same markdown table format.

Using Coverage Age as a Risk Signal

Areas not explored recently may have accumulated risk from subsequent changes:

Based on the following coverage map, identify staleness risk.

Coverage map: [paste map]
Current date: 2024-02-15
Change log (last 30 days): [paste recent changes]

For each covered area, assess staleness risk:
- "Recently changed" areas with coverage older than 30 days = HIGH staleness risk
- "Unchanged" areas with coverage older than 60 days = MEDIUM staleness risk
- "Changed today" areas with no coverage = CRITICAL gap

Output a re-exploration priority list based on staleness × change frequency.

Communicating Coverage Status to Stakeholders

Coverage maps are QA team documents. Stakeholders need higher-level summaries:

Generate a test coverage status report for stakeholders based on the following coverage map.

Coverage map: [paste map]
Audience: Product manager and engineering lead

Format:
1. Overall coverage summary (% of areas covered, not covered, with defects)
2. High-risk gaps (areas "Not Covered" that are high business impact)
3. Recent defects found through exploration (list with severity)
4. Recommended next steps (what to explore in the next sprint to close the biggest gaps)

Keep it concise — 1 page maximum. Non-technical language for the summary,
technical details in the "Recommended next steps" section.

Integrating Coverage Tracking with Sprint Planning

Coverage map data feeds sprint planning decisions. Before each sprint, generate a coverage-driven exploration plan:

Before I plan this sprint's exploratory testing, analyze the following:

Coverage map (current state): [paste map]
Upcoming changes this sprint: [paste planned work]
Available exploration time: 6 hours

Recommend:
1. Which "Not Covered" areas should be prioritized for first-time exploration?
2. Which "Defect Found" areas need re-exploration after the defect fix is deployed?
3. Which "Covered" areas are at risk of regression given the planned sprint changes?
4. Suggest a 6-hour session schedule covering the highest-priority areas.

Learning Tip: The coverage map is the single most important artifact for making exploratory testing defensible to stakeholders who question whether "just clicking around" is rigorous testing. A coverage map with timestamps, charters, and finding references demonstrates systematic coverage — not random wandering. Build the habit of updating the map within 2 hours of each session, while the details are fresh. A coverage map maintained for one quarter is a compelling demonstration of exploratory testing discipline.