·

Hands-on: AI-guided exploratory testing session

Hands-on: AI-guided exploratory testing session

How to Set Up an Exploratory Session — Charter, Context, and Tooling?

This topic walks through a complete AI-guided exploratory testing session from preparation to final output. Unlike the conceptual treatments in the previous topics, this is a worked example you can follow step by step. The feature used as the scenario is a checkout address confirmation step with a newly integrated address verification API — a realistic, mid-complexity feature that generates interesting findings.

Step 1: Gather Your Inputs

Before writing a single prompt, assemble the three inputs that drive good session setup:

Input 1: The feature change description

Feature: Checkout address confirmation step
Change (from PR description): We've added an inline address correction suggestion powered
by the USPS Address Standardization API. When a user enters a shipping address, the API
normalizes it (corrects capitalization, expands abbreviations, adds ZIP+4) and presents the
standardized version as a suggestion. The user can accept the suggestion or keep their
original entry. If the API is unavailable, the flow continues without suggestions.

Key implementation details:
- API call is triggered on blur of the ZIP code field
- Suggestion UI appears as a modal overlay with Accept/Keep Original options
- API timeout is 1000ms; on timeout, suggestion is skipped silently
- Acceptance is logged for analytics but the data is not used for payment processing

Input 2: Historical context

Historical bugs in this area:
- BUG-1089: Previous address validation API calls were blocking — caused 3-second form
  freeze when API was slow (fixed 3 sprints ago)
- BUG-1102: Address modal appeared on mobile with keyboard active, causing layout overflow
- BUG-1156: "Keep Original" option was not saving correctly — saving standardized version
  anyway (critical, fixed last sprint)

Input 3: Platform and session parameters

Platform: Web (Chrome primary), iOS Safari secondary
Session duration: 90 minutes
Prior coverage: None (first exploratory session for this feature)
Tester context: Mid-level QA engineer, familiar with the checkout flow

Step 2: Generate the Session Charter

With inputs assembled, run the charter generation prompt:

You are a senior QA engineer helping plan an exploratory testing session.

Feature context:
[paste Input 1]

Historical bugs:
[paste Input 2]

Platform and session:
[paste Input 3]

Generate a risk-aware exploratory session charter:
1. Mission statement
2. Top 5 risk areas (ranked by likelihood × impact), each with 2–3 exploration angles
3. Suggested techniques for this session
4. Session scope boundaries (in / out)
5. Stop conditions
6. One "most likely to find a bug" angle based on the change and history

Expected AI output:

CHARTER: Checkout Address Confirmation — First Coverage Session

Mission: Explore the USPS address standardization integration in the checkout
address step to discover failure modes in suggestion display, user choice persistence,
API edge cases, and mobile layout behavior.

Top 5 Risk Areas:

1. [HIGH] User choice persistence — "Keep Original" vs. "Accept Suggestion"
   History: BUG-1156 was this exact scenario. Even with the fix, regression risk is high.
   Angles:
   - Submit with original after seeing suggestion — confirm original is saved, not standardized
   - Accept suggestion — confirm standardized version is saved
   - Accept suggestion, then go back and re-enter address — confirm no stale suggestion state

2. [HIGH] API unavailability and timeout handling
   History: BUG-1089 was a blocking API call that froze the form. Current implementation
   claims non-blocking with 1000ms timeout.
   Angles:
   - Simulate API timeout (use Charles Proxy to delay 1100ms) — confirm silent skip, no freeze
   - Simulate API down (block the API domain) — confirm flow continues, no error
   - Test at 900ms and 1100ms delays to find exact timeout behavior

3. [HIGH] Mobile layout — suggestion modal with keyboard active
   History: BUG-1102 was specifically this — modal with active keyboard causing overflow.
   Angles:
   - Trigger suggestion on iOS Safari while keyboard is active on ZIP field
   - Test modal dismiss behavior with keyboard still shown
   - Test on both iPhone SE (small screen) and standard iPhone 14 viewport

4. [MEDIUM] Address edge cases that produce unexpected API responses
   Angles:
   - PO Box addresses (some APIs reject them)
   - Puerto Rico and US territory addresses
   - APO/FPO military addresses
   - Addresses with non-standard characters (accents, apartment symbols)

5. [MEDIUM] Analytics logging — confirm standardized vs. original is correctly logged
   Angles:
   - Accept suggestion: confirm "accepted" event with standardized address logged
   - Keep original: confirm "declined" event with original address logged
   - API unavailable: confirm no analytics event logged (not a failure)

Suggested techniques: Network condition simulation (Charles/DevTools), address database
sampling, cross-device session, back-navigation stress testing

In scope: Address confirmation step, suggestion modal, API integration behavior, mobile layout
Out of scope: Payment processing (out of this feature's scope), delivery estimations
Stop conditions: All 5 risk areas have at least partial coverage, or 90 minutes elapsed

Most likely to find a bug: User choice persistence (regression from BUG-1156 fix) — test this first.

Step 3: Prepare Your Tooling

With the charter in hand, set up the tools you'll need:

For network simulation:
- Desktop: Chrome DevTools → Network → Throttle (for slow connections); Charles Proxy or mitmproxy for custom delay/blocking
- Mobile: Charles Proxy with device proxied; or use network throttling on a real device via iOS Settings

For session notes:
- Open a plain text file (or Rapid Reporter if your team uses it)
- Set up the action-observation-tag format header at the top
- Note start time

For AI co-pilot:
- Open your AI assistant in a separate browser tab or window
- Paste the session brief (charter + context) as the opening message to set conversation context

Session brief to paste into AI at session start:

Session brief — please retain this context for our conversation.

I am starting a 90-minute exploratory testing session with the following charter:
[paste charter]

Application: [App name], URL: [URL], logged in as: test user with existing address book
Platform: Chrome 124 / macOS Sonoma 14.4

As my co-pilot:
- When I share observations, suggest follow-up angles
- Help me generate specific test data when I need it
- If I describe something that suggests a systemic issue, flag it
- Help me decide whether to pivot or stay on charter when I find something unexpected
- Do not just echo my observations — be an active thinking partner

I'm starting now.

Learning Tip: The session setup phase is where most of the AI value is created. A well-constructed charter and a well-set AI context brief take 20–30 minutes — but they make the 90-minute session dramatically more productive than walking in with a vague plan. Track your "setup time vs. findings per hour" ratio across sessions. Engineers who invest in structured setup consistently find more defects per exploration hour. The setup is not overhead — it's the work.


How to Use AI as a Real-Time Partner During an Exploratory Session?

With the charter set and the AI context brief loaded, the session begins. This section demonstrates the real-time interaction pattern across the session timeline.

Minutes 0–20: Charter Orientation and First Risk Area

Action sequence:
1. Navigate to the checkout flow
2. Enter a valid shipping address with a known-correctable element (e.g., "ST" → "STREET")
3. Tab out of the ZIP field and observe suggestion behavior

Observation logged:

[00:04] ACTION: Entered "123 main st, anytown, 12345" — tabbed out of ZIP
        OBS: Suggestion modal appeared after ~400ms with standardized: "123 Main Street, Anytown, NY 12345-6789"
        OBS: Modal has "Accept" and "Keep Original" buttons — visible, accessible
        TAG: WORKS

[00:06] ACTION: Clicked "Keep Original"
        OBS: Form populated with original value "123 main st, anytown, 12345"
        OBS: Network tab showed POST to /api/analytics with "choice: declined"
        TAG: WORKS

Query to AI:

I verified the happy path for "Keep Original" — original value saved, analytics logged correctly.
What should I test next to stress the choice persistence before moving to other risk areas?
Specifically: what are the edge cases for re-editing after making a choice?

AI response:

Good start. For choice persistence edge cases, test:
1. Accept suggestion → navigate back → re-enter address → tab out: does stale suggestion state
   persist or is a fresh API call made?
2. Keep original → close browser tab → reopen checkout in new tab: does original or standardized
   persist in the saved address?
3. Accept suggestion → immediately click browser back → click forward: what state does the
   form show?
4. Keep original on desktop → continue to payment → return to address step via "Edit": is
   original still there?

I'd prioritize #3 (back-forward navigation) — that's a common state management failure point.

Action based on AI suggestion:

[00:11] ACTION: Accepted suggestion → immediately clicked browser back → clicked Forward
        OBS: Form shows standardized address (the accepted one) — correct
        TAG: WORKS

[00:13] ACTION: Accepted suggestion → used browser History to go back two pages → navigated
        forward through checkout
        OBS: Form shows ORIGINAL address, not the accepted standardized version
        TAG: BUG? REGRESSION — multiple-history-step back loses accepted choice

Immediately query AI:

Interesting finding: If I navigate back multiple history steps and return to the checkout,
the accepted suggestion is lost and the original address appears. Is this likely a state
management bug or expected behavior? What additional information should I capture?

AI response:

This looks like a state management bug — specifically, the accepted choice being held in
component state rather than persisted to the checkout session object. When multiple-step
back-navigation triggers a full component remount, component state is lost.

To confirm and document:
1. Check the network tab: when you navigated forward again, was there a fresh API call to
   /api/checkout or did it reload from cache?
2. Check localStorage or sessionStorage for any address-related keys before and after the navigation
3. Test: Accept suggestion → proceed to payment page → use "Edit Address" link: does it show
   accepted or original?
4. Test: Accept suggestion → complete order → check order confirmation: which address appears?

Severity assessment: This is likely HIGH — if the accepted standardized address isn't persisted
through the full checkout flow, users may complete orders with non-standardized addresses
despite having accepted the correction. Check the order confirmation screen specifically.

Minutes 20–50: Network Resilience Testing

Setup: Configure Charles Proxy with a breakpoint on the USPS API endpoint.

[00:24] ACTION: Set Charles Proxy to add 1100ms delay to api.usps.com/* requests
        ACTION: Entered new address, tabbed out of ZIP
        OBS: Form waited approximately 900ms, then NO modal appeared
        OBS: Form remained editable immediately
        TAG: WORKS — timeout graceful

[00:28] ACTION: Set Charles Proxy to block api.usps.com/* entirely
        ACTION: Entered new address, tabbed out of ZIP
        OBS: No modal appeared, form remained usable — as expected
        OBS: Browser console showed: "Address suggestion unavailable — proceeding without normalization"
        TAG: WORKS

[00:31] ACTION: Set Charles to return 500 from the API
        ACTION: Tabbed out of ZIP
        OBS: Same graceful fallback — no modal, no freeze
        TAG: WORKS

[00:34] ACTION: Set Charles to delay exactly 1000ms (the stated timeout)
        ACTION: Tabbed out of ZIP, then clicked Submit quickly (within 500ms)
        OBS: Modal appeared AFTER form submission had started — modal appeared over loading spinner
        TAG: BUG? Race condition — if user submits before timeout, modal can appear mid-submission

Query to AI:

Found a race condition: if the API responds exactly at or just after the timeout boundary AND
the user has already clicked Submit, the suggestion modal appears on top of the submission
loading state. How should I document this and what severity would you assign?

AI response:

This is a genuine race condition at the timeout boundary. Severity: MEDIUM.

Document it as:
- Condition: API latency exactly at or marginally above the 1000ms timeout
- Trigger: User clicks Submit before timeout fires (possible in fast typing/form fill scenarios)
- Impact: Confusing UX — modal overlay appears after submission has started; user's choice
  may arrive after order creation begins
- Additional questions to answer for the bug report:
  1. Does clicking Accept or Keep Original on this post-submission modal actually change anything?
  2. Does the Submit action complete normally with the original address in this race condition?
  3. Is this race condition more likely on slower devices (higher API latency)?

Test point #1 immediately — if accepting the modal suggestion after submission has started
can change the order's address, this elevates to HIGH.

Minutes 50–75: Mobile Layout and Remaining Risk Areas

Switch to mobile testing (iOS Safari via device or BrowserStack).

[00:52] ACTION: Opened checkout on iOS Safari (iPhone 14), entered address
        ACTION: Tapped ZIP field — iOS numeric keyboard appeared
        OBS: Tabbed to next field — keyboard changed type, API call triggered
        OBS: Suggestion modal appeared — BUT appeared below the keyboard
        OBS: "Accept" button was partially hidden behind the keyboard
        TAG: BUG? REGRESSION of BUG-1102 — modal layout on mobile still broken

[01:01] ACTION: Tested Puerto Rico ZIP (00979)
        OBS: API returned suggestion with Puerto Rico address correctly — WORKS
        TAG: WORKS — different from what BUG-1204 suggested might happen

[01:08] ACTION: Entered APO address (PSC 802 Box 74, APO AE 09499)
        OBS: Suggestion modal appeared with "unable to standardize" message but no error
        OBS: "Keep Original" only option shown
        TAG: WORKS — graceful handling

Query to AI:

I found what appears to be a regression of BUG-1102 — the modal is partially hidden
behind the iOS keyboard on iPhone 14. How do I confirm this is a regression vs. a
different manifestation of the same underlying issue?

AI response:

To confirm regression:
1. Check the fix for BUG-1102 in the PR notes — what was the specific fix? (Padding
   adjustment, scroll behavior, or viewport change?)
2. Test on iPhone SE (smaller screen) — more likely to see the same issue
3. Test on Android Chrome — same modal behavior, different keyboard handling
4. Test: when modal appears, tap anywhere outside modal → does the keyboard dismiss?
   Does the modal move above the keyboard?

If the fix for BUG-1102 was a specific pixel adjustment that was correct for older iOS
but the new iOS keyboard height changed, this may be the same root cause requiring
an updated fix. Document both the original bug reference and the current behavior.

Minutes 75–90: Coverage Review and Session Close

With 15 minutes remaining, run the mid-session coverage check:

I have 15 minutes left. Here's my coverage so far:
✅ Happy path suggestion + Keep Original
✅ Happy path suggestion + Accept
⚡ Multi-step back navigation loses accepted choice (BUG found — state management)
✅ API timeout at 1100ms (graceful)
✅ API blocked entirely (graceful)
⚡ Race condition at exact timeout boundary (BUG found — race)
⚡ Mobile iOS modal partially hidden behind keyboard (BUG found — regression?)
✅ Puerto Rico ZIP (handled correctly)
✅ APO address (graceful "unable to standardize" message)
⬜ Analytics logging — not yet tested for all scenarios
⬜ "Accept suggestion → complete full order → check order confirmation address" — critical,
    related to the state management bug

Given 15 minutes, what should I prioritize?

AI response:

With 15 minutes, prioritize in this order:

1. [CRITICAL — 5 min] Accept suggestion → complete full order → verify order confirmation address.
   Your state management finding is HIGH risk. If the accepted standardized address isn't
   saved to the order, this is a data integrity issue. This test answers whether the bug
   is cosmetic (UX only) or functional (affects order data).

2. [HIGH — 5 min] Quick analytics logging spot check. Accept and decline once each, verify
   both events logged correctly in Network tab. Fast to check.

3. [Low — 5 min] Document and close. Use remaining time for notes cleanup.

Do not start new exploration areas. The state management finding needs the critical path test now.

Learning Tip: The AI's value at the 75-minute mark — when you're tired and running out of time — is especially high. Without a thinking partner, most testers spend this phase winding down and writing notes. With AI input, you often discover that the most important test of the session is one you haven't run yet. Build the habit of an explicit "15 minutes left" check with your AI co-pilot in every session. This single habit has generated some of the most valuable findings in AI-assisted exploratory sessions.


How to Debrief and Synthesize Findings with AI After a Session?

The session is over. You have 90 minutes of notes, three clear bugs, a possible regression, and several follow-up questions. The debrief transforms these raw observations into artifacts that survive the session and drive action.

Step 1: Paste the Full Session Notes into AI

At session end, paste the complete note file with the synthesis prompt:

I've just completed the following exploratory session. Please synthesize the findings.

Session details:
- Charter: [paste charter]
- Date: [date]
- Duration: 90 minutes
- Platform: Web Chrome 124 + iOS Safari

Raw session notes:
[paste complete note file]

Produce:
1. FINDINGS (all distinct findings, classified and severity-rated)
2. COVERAGE SUMMARY (what was covered vs. charter scope)
3. PATTERNS (any systemic issues suggested by the findings)
4. CRITICAL FLAG (any finding requiring immediate developer attention)
5. FOLLOW-UP CHARTERS (suggested charters for the next session based on gaps)
6. DRAFT BUG REPORTS (full report for each Defect finding — title, steps, expected, actual, severity)

Step 2: Review and Validate the Synthesis

Read through the AI output critically:

Validate each finding:
- Does the reproduction step match what you actually did?
- Is the severity appropriate given business context you know that AI may not?
- Is the "actual result" described precisely enough to reproduce?

For the state management finding (back-navigation losing accepted choice):

The AI described this as: "Accepted suggestion lost after multi-step back navigation."

Revise: "After accepting the standardized address suggestion, navigating back more than
one step in browser history and returning to the checkout causes the checkout to display
the ORIGINAL address instead of the accepted standardized version. Verified with Chrome 124.
Not reproduced with single-step back navigation."

For the race condition:

The AI generated a vague severity: "MEDIUM — confusing UX."

The additional test I ran confirmed: clicking Accept on the post-submission modal does NOT
change the order address. The race condition is cosmetic (confusing UX) but does not affect
data integrity. Severity: LOW — downgrade from MEDIUM.

Step 3: Generate the Debrief Document

The debrief document is the artifact that communicates session outcomes:

Generate a session debrief document for the following synthesized findings.

Target audience: Development team and QA lead
Format:
- Executive summary (2–3 sentences)
- Findings table (all findings, severity, status)
- Critical findings detail (one paragraph per Critical/High finding)
- Coverage status (vs. charter)
- Next session recommendation
- Open questions for the development team

Synthesis: [paste synthesis output]

Step 4: File Bug Reports

For each Defect finding, file a bug report using the AI-generated draft as the starting point. Edit for:
- Precision of reproduction steps (add exact URLs, exact test data used)
- Severity based on your knowledge of business context
- Labels/tags appropriate to your team's tracking system
- Assignee (who owns this component?)

The mobile modal finding — filing the bug:

Bug title: [iOS Safari] Address suggestion modal partially obscured by keyboard on iPhone 14
Environment: iOS 17.4, Safari, iPhone 14 (also reproduced on iPhone SE)
Steps to reproduce:
1. Add item to cart and proceed to checkout
2. On the address entry screen, tap the ZIP code field
3. Type a valid ZIP code (e.g., 12345)
4. Tap the City field (or press Next on keyboard)
5. Wait for the address suggestion modal to appear
Expected: Suggestion modal is fully visible above the keyboard
Actual: Modal appears behind/below the active keyboard; "Accept" button is partially hidden
Severity: High (usability — users cannot easily accept the address suggestion on iOS)
Priority: High (affects all iOS users going through checkout with address normalization)
Related: BUG-1102 (previous version of this issue — confirm if this is a regression)

Step 5: Update the Coverage Map

From the synthesis's coverage summary, update the coverage map:

Update the following coverage map based on today's session.

Coverage map: [paste map]
Session synthesis coverage summary: [paste coverage section from synthesis]

Mark covered areas with today's date and charter reference CHARTER-007.
Mark defect-found areas with the bug IDs.
Leave "Not Covered" areas unchanged.
Output the updated map.

Learning Tip: Schedule your debrief time as a non-negotiable block immediately after the session — 20 to 30 minutes while observations are still vivid. The AI synthesis prompt takes 5 minutes to run. The remaining time is review, edits, and filing. If you defer the debrief to "later today" or "tomorrow," you'll lose 30% of the nuance in your notes. The debrief is not optional overhead — it is the mechanism that converts session experience into team-usable artifacts.


What Actionable Outputs Should You Produce from an AI-Guided Exploratory Session?

A completed AI-guided exploratory session should leave behind a specific set of artifacts. These artifacts make the session's value visible, persistent, and actionable for the broader team.

The Required Artifact Set

Every exploratory session should produce, at minimum:

Artifact Format Audience Purpose
Session debrief document Markdown / Confluence Dev team, QA lead Communicate what was found and why it matters
Bug reports Jira / linear / GitHub issues Developers Enable fixes
Coverage map update Shared tracking doc QA team Track what's been explored
Follow-up charters Charter document QA team Plan next sessions

Optional but high-value:

Artifact Format Audience Purpose
Regression test cases Test management system QA automation team Automate the important findings
Risk memo Short email or Slack Product manager Communicate risk for release decisions
Developer question list Jira comment or Slack Specific developers Resolve open questions quickly

Producing the Risk Memo

When a session produces Critical or High findings, a risk memo communicates the impact to non-technical stakeholders:

Generate a risk memo for the following exploratory session findings.

Target audience: Product manager (non-technical)
Format: 3–5 sentences, plain language, focused on user impact and release risk

Key findings to communicate:
- Multi-step back navigation loses accepted address standardization (HIGH)
- iOS Safari suggestion modal hidden behind keyboard (HIGH)
- Race condition at API timeout boundary produces confusing UX (LOW)
- Regression of BUG-1102 suspected on iOS (needs developer confirmation)

Include: What was found, what user impact is, and what action is needed before release.

Example output:

During this week's exploratory testing of the new address verification feature, we found
two issues that should be resolved before this feature ships to production. First, on web,
users who navigate back through checkout history and return to the address step may find
their address correction choice was lost, potentially causing orders to be placed with
non-standardized addresses. Second, on iOS, the address suggestion pop-up is partially
hidden behind the phone keyboard, making it hard for users to accept the correction.

Both issues have been logged as High severity and are assigned to the development team.
We recommend validating both fixes before the next release candidate build. A re-exploration
session is planned for next week to confirm both fixes and check for related edge cases.

Producing Follow-Up Charters

The follow-up charters from today's session become tomorrow's session plan:

Based on the gaps and open questions from today's session, generate 2 follow-up charters.

Gaps identified:
1. Analytics logging was not fully tested (all scenarios beyond the happy path)
2. "Accept suggestion → complete full order" was started but not fully explored

Open questions:
1. Does the state management bug affect the order confirmation address (not just checkout UI)?
2. Is the iOS modal issue a regression of BUG-1102 specifically?

Generate:
Charter 1: Focus on state management bug verification and order confirmation flow
Charter 2: Focus on iOS modal regression confirmation and other mobile edge cases

Each charter: mission statement, scope, risk hypothesis, 3 exploration angles, stop conditions

The Automation Backlog

From findings and follow-up charters, identify what should be automated:

Based on the following session findings, recommend test cases for the automation backlog.

Findings: [paste synthesis]

For each finding, recommend:
1. Should this become an automated test? (Yes / No / Maybe)
2. If Yes: What type? (Unit | API | E2E | Visual regression)
3. If Yes: What is the test priority? (Critical | High | Medium)
4. Brief description of what the automated test should verify

Also: Are there any findings that should NOT be automated and why?

Example output for the iOS modal issue:

Finding: iOS Safari modal partially hidden behind keyboard
Automation candidate: Yes — Visual regression test
Type: Visual regression (Applitools or Percy)
Priority: High
Test description: Capture screenshot of the address suggestion modal on iOS Safari with
keyboard active; compare against baseline on each deployment. Detects layout regressions
caused by keyboard height changes across iOS versions.

Communicating Session Value Over Time

After five sessions, the accumulated artifacts tell a story:
- 12 defects found, 8 already fixed
- 3 systemic risks identified and tracked
- Coverage map shows 70% of feature area explored
- 4 regression test cases added to the automated suite

This story is the ROI case for exploratory testing. AI-guided sessions don't just find bugs — they produce the documentation that makes the discipline defensible and continuously improvable.

Generate a quarterly exploratory testing summary for the following session archive.

Sessions: [paste archive list with dates and finding counts]
Defect findings: [paste list]
Coverage map: [paste current state]

Produce:
1. Quantitative summary (sessions run, defects found by severity, coverage percentage)
2. Top 3 systemic risks identified this quarter
3. Impact: How many defects found vs. how many were fixed before production?
4. Coverage trend: Which areas improved most? Which remain gaps?
5. Recommended focus for next quarter

Learning Tip: At the end of each quarter, invest 2 hours in a retrospective of your exploratory testing practice — not just your findings. Ask: Which session setups produced the best findings-per-hour? Which AI interaction patterns were most valuable? Which charter types consistently missed important bugs? This meta-analysis, combined with your AI tools, will make you measurably more effective at exploratory testing each quarter — compounding the value of the practice over time.