Compliance & Accessibility Testing with AI

How to Generate WCAG 2.1/2.2 Compliance Test Cases with AI?

Accessibility testing has historically been treated as a specialist discipline separate from mainstream QA — the province of dedicated accessibility consultants or automated scanners run as an afterthought. This separation is expensive and ineffective. WCAG 2.1 has 78 success criteria across three conformance levels. WCAG 2.2 adds 9 more. No automated scanner covers more than 30–40% of these criteria. The remaining 60–70% require human judgment and structured manual testing — which is exactly where AI-assisted test case generation delivers the most value.

Understanding the WCAG testing coverage gap

Before you can effectively use AI to generate WCAG test cases, you need to understand what automated tools cover and what they miss:

Testing method	WCAG criteria coverage	What it catches
axe-core / Deque	~30–35%	Missing alt text, missing form labels, color contrast failures, ARIA misuse
Lighthouse accessibility	~25–30%	Similar to axe, different scoring
Manual testing with screen reader	~60–75%	All automated + focus order, screen reader announcements, interaction patterns
Manual testing with keyboard only	~50–60%	All keyboard navigation criteria
Combined automated + manual	~85–90%	Near-complete WCAG coverage

The gap between automated and manual coverage is where AI-generated test cases add the most value — they give manual testers a structured checklist rather than relying on tester experience and memory.

Generating WCAG 2.1 test cases for a specific component

I need to generate manual accessibility test cases for WCAG 2.1 Level AA 
compliance for a multi-step checkout form in a web application.

## Component Description
The checkout form has three steps:
1. Cart review: List of items, quantities (editable), remove buttons, subtotal
2. Shipping: Address fields, shipping method radio buttons, "Save address" checkbox
3. Payment: Credit card number, expiry, CVV, billing address (checkbox to 
   copy shipping), submit button

After submission:
- Loading state: spinner replaces submit button, form fields disabled
- Success: Redirect to order confirmation page
- Error: Error message appears above the form, focus moves to the error

## Technology
React SPA, custom CSS components, no UI library (no built-in ARIA support)

Generate manual test cases for the following WCAG 2.1 AA criteria:
- 1.1.1 Non-text Content (alt text on any images/icons)
- 1.3.1 Info and Relationships (semantic structure)
- 1.3.5 Identify Input Purpose (autocomplete attributes)
- 1.4.3 Contrast Minimum (text contrast)
- 1.4.11 Non-text Contrast (input border, focus indicator contrast)
- 2.1.1 Keyboard accessibility (all functionality via keyboard)
- 2.4.3 Focus Order (logical tab sequence)
- 2.4.7 Focus Visible (visible focus indicator)
- 3.3.1 Error Identification (error messages)
- 3.3.2 Labels or Instructions (form labels)
- 4.1.2 Name, Role, Value (ARIA on custom components)
- 4.1.3 Status Messages (WCAG 2.1 — screen reader announcements)

For each test case:
1. Test case ID (e.g., WCAG-1.1.1-001)
2. Success criterion tested
3. Specific step in the checkout where this applies
4. Test steps (specific actions to take)
5. Pass condition (what correct implementation looks like)
6. Fail condition (what a violation looks like)
7. Testing method (screen reader, keyboard, browser inspector, 
   contrast analyzer tool)
8. Notes on common implementation mistakes for this criterion

Generating WCAG 2.2 additions test cases

WCAG 2.2 introduced new criteria that specifically address mobile and cognitive accessibility. AI can generate tests for these newer criteria:

Generate test cases for the WCAG 2.2 criteria that are NEW 
(not in WCAG 2.1) at Level AA:

- 2.4.11 Focus Not Obscured (Minimum)
- 2.4.12 Focus Not Obscured (Enhanced) — AA+
- 2.5.3 Label in Name (moved to AA in 2.2)
- 2.5.7 Dragging Movements
- 2.5.8 Target Size (Minimum)
- 3.2.6 Consistent Help
- 3.3.7 Redundant Entry
- 3.3.8 Accessible Authentication (Minimum)
- 3.3.9 Accessible Authentication (Enhanced)

For each criterion, apply it to our checkout form component described 
above and produce:
1. A plain-language explanation of what the criterion requires 
   in the context of our specific component
2. Test steps to verify compliance
3. Common failure patterns for React SPAs specifically
4. The code pattern or CSS fix that resolves the most common failure

Pay particular attention to 3.3.8 Accessible Authentication — 
explain what our payment form must do to comply, and what 
common implementations violate it.

Building a WCAG test matrix for a full application

For a broader application audit, AI can generate a prioritized test matrix:

I need to prioritize WCAG 2.1/2.2 AA testing for our application given 
a 3-day testing window with 2 QA engineers.

## Application Overview
- 8 main page types: Homepage, Product listing, Product detail, 
  Cart, Checkout (3 steps), Order confirmation, Account, Support chat
- User base: B2C, expected 5–8% of users have accessibility needs
- Legal exposure: US-based company, ADA Title III applies; 
  EU expansion planned (EN 301 549)
- Previous audit: Last audit was 18 months ago, failed on 2.4.3, 
  2.1.1, and 3.3.1

Produce a prioritized testing plan that:
1. Ranks the 8 page types by accessibility risk 
   (legal exposure × traffic volume × interaction complexity)
2. For the top 4 page types, identify the 10 most critical 
   WCAG criteria to test first
3. Recommend which criteria to cover with automated tools (axe, Lighthouse) 
   vs. which require manual testing
4. Estimate testing time per page type per criterion
5. Identify any criteria from the previous audit failures that 
   require regression testing regardless of priority
6. Produce a one-page test execution matrix with pass/fail/not-tested 
   columns for each page × criterion combination

Learning Tip: WCAG criterion 4.1.3 (Status Messages, introduced in 2.1) is one of the most commonly missed by both automated tools and manual testers who don't use screen readers regularly. It requires that loading states, success messages, and error notifications be announced to screen reader users without moving keyboard focus. Ask AI specifically: "For each dynamic state change in this component (loading, success, error, validation), what ARIA live region implementation is required for 4.1.3 compliance?" Most developers implement visible status messages correctly but forget the screen reader announcement.

GDPR testing is one of the most underspecified areas in QA. Most teams treat compliance as a legal and architecture concern, not a test execution concern. But GDPR compliance has observable, testable behaviors: data subject rights must be implemented, consent must be recorded and respected, data retention limits must be enforced. QA engineers who can generate and execute these test scenarios provide measurable compliance assurance — not just theoretical declarations.

Not all of GDPR is directly testable by QA. But these articles have observable, verifiable behaviors in a web application:

GDPR Article	Testable behavior	QA test type
Art. 17 — Right to Erasure	User can delete account; data is actually removed	API + DB verification
Art. 20 — Right to Portability	User can export their data in machine-readable format	API + data format validation
Art. 15 — Right of Access	User can view all data held about them	API response completeness check
Art. 7 — Consent	Consent is recorded; withdrawal stops processing	UI + API + audit log check
Art. 25 — Data Minimisation	API responses don't return unnecessary PII	Response field audit
Art. 32 — Security	Data in transit encrypted; sensitive fields masked	Network + API response check
Art. 13/14 — Transparency	Privacy notice is accessible and accurate	Content + link verification

Generate GDPR compliance test scenarios for our user account management 
system. This is an authorized internal compliance test for our own application.

## System Context
- User data stored: email, name, date of birth, phone, shipping addresses, 
  order history, browsing history, payment method metadata (last 4 only, 
  no full card numbers)
- Consent types recorded: Marketing emails, analytics tracking, 
  third-party data sharing
- Data subject rights portal: Available in account settings
- Data retention: Active accounts indefinitely; deleted accounts — 
  personal data purged after 30 days, order records retained 7 years 
  (legal/tax requirement)

## Scenarios Needed

Generate test scenarios for these GDPR rights:

1. Right to Erasure (Art. 17):
   - User requests account deletion
   - Verify PII is purged within 30 days
   - Verify order records are retained (legal basis)
   - Verify all sessions are invalidated immediately
   - Verify user cannot log back in after deletion request

2. Right to Data Portability (Art. 20):
   - User requests data export
   - Verify export contains all categories of personal data
   - Verify format is machine-readable (JSON or CSV)
   - Verify no additional data about other users is included
   - Verify export is delivered securely (not emailed in plaintext)

3. Consent Withdrawal (Art. 7):
   - User withdraws marketing email consent
   - Verify future marketing emails are not sent
   - Verify historical consent record is preserved (audit trail)
   - Verify consent withdrawal doesn't affect non-marketing emails 
     (transactional emails still sent)

4. Data Minimisation (Art. 25):
   - Verify API responses for product listings don't include PII
   - Verify recommendations API doesn't expose other users' behavior
   - Verify admin API responses mask sensitive fields for 
     lower-privilege admin roles

For each scenario:
1. Preconditions (test data setup required)
2. Test steps (specific actions and API calls)
3. Verification method (API response check, DB query, email 
   system check, audit log check)
4. Pass criteria
5. Time-sensitive criteria (e.g., "within 30 days" — how to test 
   this without waiting 30 days in staging)

GDPR timelines (30-day erasure, 72-hour breach notification) can't be tested by waiting. AI can generate workaround test strategies:

Several of our GDPR test scenarios involve time-based verification 
that we can't test in real-time. For each scenario, suggest a 
testable workaround:

1. Account deletion: PII must be purged within 30 days
   - How to test this without waiting 30 days in staging?

2. Data export delivery: Must be delivered within 1 month of request
   - How to test the delivery mechanism without waiting?

3. Consent effectiveness: After marketing consent withdrawal, 
   no marketing emails within the next campaign cycle
   - How to verify this without waiting for the next marketing batch?

4. Session invalidation after deletion request: All existing sessions 
   must be invalidated immediately
   - How to test "immediately" in an automated test?

For each: suggest (a) a mechanism to manipulate time in staging, 
(b) a unit-level test that verifies the behavior without end-to-end timing, 
or (c) a configuration test that confirms the scheduled jobs are 
configured correctly.

Generating CCPA and cross-regulation test cases

Our application must comply with both GDPR (EU users) and CCPA 
(California users). Generate a comparison test matrix that:

1. Identifies where GDPR and CCPA requirements overlap 
   (test once, covers both)
2. Identifies CCPA-specific requirements that need additional tests 
   beyond GDPR:
   - "Do Not Sell" opt-out
   - CCPA-specific disclosure requirements
   - Different category definitions for personal information
3. Identifies GDPR-specific requirements that have no CCPA equivalent
4. Recommends the minimum test set that achieves dual compliance 
   verification coverage

Our user base: 70% EU (GDPR), 15% California (CCPA), 15% other US states

Learning Tip: The most common failure in GDPR test execution is verifying only the front-end behavior without confirming the back-end action. A "delete account" button that shows a success message but doesn't actually purge data from the database is a GDPR violation — and the only way to catch it is to verify the database state after the operation. Always include database-level verification steps in your GDPR test cases, and make sure your QA team has the access and tooling to execute them in staging. If direct DB access isn't available, work with the development team to expose a compliance verification endpoint in staging that confirms deletion status.

How to Generate and Track Regulatory Compliance Checklists with AI?

Accessibility and privacy are just two compliance domains. Depending on your application type, you may also need to test for PCI-DSS (payment card), SOC 2 Type II (security controls), HIPAA (healthcare), ISO 27001 (information security), or FedRAMP (US government cloud). AI can generate checklists for all of these — and, critically, help you distinguish between compliance requirements that QA testing can verify and those that require infrastructure audits, policy reviews, or certification processes.

Generating a QA-testable compliance subset from a standard

Not all compliance requirements are test-executable. AI helps you filter:

I need to create a QA testing compliance checklist for our SaaS 
application from PCI-DSS v4.0 requirements.

Important constraint: Only include requirements that a QA engineer 
can verify through application testing (API testing, UI testing, 
network testing). Exclude infrastructure requirements that require 
system administrator access, policy requirements that require 
document review, and certification requirements that require 
third-party assessors.

## Our Application's Payment Context
- We use Stripe as payment processor (SAQ A eligible — we never 
  touch card data directly)
- We display the last 4 digits of saved cards
- We store billing addresses
- We have an admin panel where support staff can view masked 
  payment method info

Generate a QA-testable checklist for:
1. Requirements 6.x (Secure software development — testing for 
   vulnerabilities)
2. Requirements 7.x (Access control — testing authorization)  
3. Requirements 8.x (Authentication — testing login security)
4. Requirements 10.x (Logging — testing audit trail)

For each requirement:
1. The PCI-DSS requirement ID and plain-language description
2. The specific QA test that verifies compliance
3. The expected pass evidence (what you observe when compliant)
4. Severity if failed (critical for PCI scope, major, minor)
5. Whether this is testable in isolation or requires a specific 
   environment state

Generating SOC 2 Type II control verification tests

Our application is pursuing SOC 2 Type II certification. 
The auditor will verify that controls operate effectively over 
a 12-month period. As QA, I need to generate tests that verify 
the following SOC 2 Common Criteria controls are implemented:

CC6.1 — Logical and Physical Access Controls (authentication, authorization)
CC6.3 — Role-based access removal for terminated personnel
CC6.7 — Transmission and encryption controls
CC7.2 — System monitoring and anomaly detection
CC8.1 — Change management process controls
CC9.2 — Risk assessment and vendor management

For each control, generate:
1. The specific application behavior that demonstrates the control
2. A test to verify that behavior in staging
3. Evidence the auditor would need to see (screenshot, API response, 
   log entry, or configuration export)
4. How frequently this should be re-verified (one-time, per release, 
   per quarter)

CC6.3 specific: Our application supports SSO via Okta. 
When a user is deprovisioned in Okta, their application access 
should be revoked immediately. Generate a specific test for this 
with exact steps.

Building a compliance test tracking system prompt

AI can also help you design the tracking structure, not just the test cases:

Design a compliance test tracking structure for a QA team that 
manages multiple regulatory standards simultaneously (WCAG, GDPR, 
PCI-DSS, SOC 2).

The tracking structure should:
1. Link each test case to one or more regulatory requirements 
   (a single test can cover GDPR Art. 17 AND SOC 2 CC6.1)
2. Track test execution date, tester, environment, and result
3. Flag when a test hasn't been re-run after a significant code change
4. Generate a compliance coverage report showing % of 
   requirements tested, % passed, and last verified date
5. Support generating a "compliance evidence package" for auditors: 
   a document listing requirement → test → evidence → result

Output:
1. A proposed data structure (JSON or CSV schema) for tracking this
2. A template for the compliance evidence package document
3. Recommended re-testing frequency for each standard
4. A prioritization rule for when tests conflict with sprint capacity

Learning Tip: Compliance checklists generated by AI need a domain-specific review before you run them. AI knows GDPR, PCI-DSS, and WCAG in their published forms — but it does not know your organization's specific implementation decisions, scope limitations (like being SAQ A eligible for PCI), or which controls you've implemented at the infrastructure level vs. application level. After generating a compliance checklist with AI, always have it reviewed by your legal/compliance team or a compliance SME before treating it as authoritative. Use AI to generate the 80% — bring a human expert for the remaining 20% that requires interpretation.

How to Combine AI Analysis with Manual Verification in Accessibility Audits?

Automated accessibility scanning and AI-generated test cases are not substitutes for manual testing with assistive technologies. Screen readers, keyboard-only navigation, and zoom testing reveal failure modes that no automated tool or AI can simulate — because they require actual user experience to detect. The professional approach combines all three layers: automated tools for quick coverage, AI-generated test cases for structured manual coverage, and direct assistive technology testing for real-world verification.

The three-layer accessibility audit model

Layer 1: Automated scanning (axe-core, Lighthouse, Pa11y)
- Coverage: ~30-35% of WCAG criteria
- Time: Minutes per page
- Output: Definitive violations with code references
- Role: Catch unambiguous issues before manual testing

Layer 2: AI-generated structured manual testing
- Coverage: Additional 40-50% of WCAG criteria
- Time: 2-4 hours per page type
- Output: Structured pass/fail per criterion
- Role: Systematic coverage of criteria that require human judgment

Layer 3: Assistive technology testing (NVDA, JAWS, VoiceOver, TalkBack)
- Coverage: 15-25% of criteria that require actual user experience
- Time: 4-8 hours per user journey
- Output: Qualitative findings + specific failure descriptions
- Role: Verify real-world usability, not just technical compliance

Using AI to plan the assistive technology testing phase

I have completed automated accessibility scanning (axe-core) and 
AI-generated structured manual testing for our checkout flow. 
Here are the remaining gaps that require assistive technology testing:

Automated scan passed: 1.1.1, 1.4.3, 4.1.2 (basic ARIA)
Manual structured testing passed: 2.1.1, 2.4.3, 2.4.7, 3.3.1, 3.3.2
Still needs AT verification: 4.1.3, 2.4.11, 3.3.8, 1.3.1 (screen reader 
  reading order), 2.1.3 (no keyboard trap in payment iframe)

Plan a 4-hour assistive technology testing session using:
- NVDA + Chrome on Windows 10 (representing most screen reader users)
- VoiceOver + Safari on iOS 16 (mobile screen reader)
- Keyboard-only navigation on Chrome (no mouse)

For each remaining criterion:
1. Which AT combination to use for testing
2. Exact test steps using that AT (e.g., "Tab to the card number 
   field — NVDA should announce: 'Card number, required, edit text'")
3. What announcement or behavior to listen for
4. Common failure pattern to watch for
5. How to document the finding if it fails 
   (what screen recording + description captures the evidence)

Also generate a quick-reference card for the tester:
- NVDA keyboard shortcuts needed for this test
- VoiceOver gestures needed for this test
- How to navigate the checkout form steps using keyboard only

Integrating AI analysis of AT test findings

After AT testing, use AI to help prioritize and classify findings:

Here are the findings from our 4-hour assistive technology testing 
session on the checkout flow. Classify and prioritize each finding.

Finding 1: 
Screen reader (NVDA): When the "Remove item" button is activated in 
the cart, NVDA announces nothing — the item disappears visually but 
the screen reader user has no confirmation the action completed.
Criterion: 4.1.3 Status Messages
Observed: No ARIA live region announcement after cart item removal

Finding 2:
Screen reader (NVDA): The credit card number input field is announced 
as "Edit text" with no label — the visible label "Card number *" is 
not programmatically associated with the input.
Criterion: 1.3.1 Info and Relationships / 4.1.2 Name, Role, Value
Observed: Input has placeholder "1234 5678 9012 3456" but no aria-label 
or associated <label> element

Finding 3:
Keyboard only: Tab order on the shipping form jumps from "First Name" 
to "Country" (skipping Last Name, Address Line 1, Address Line 2, City, 
State, Zip) before returning to those fields.
Criterion: 2.4.3 Focus Order
Observed: Logical reading order is broken by CSS grid layout 

Finding 4:
Mobile (VoiceOver + Safari): The 3-step progress indicator at the top 
of the checkout is not announced to VoiceOver at all — sighted users 
see "Step 2 of 3: Shipping" but VoiceOver users have no equivalent.
Criterion: 1.3.1 Info and Relationships
Observed: Progress steps implemented as CSS-styled divs with no ARIA

For each finding:
1. WCAG 2.1/2.2 criterion(s) violated
2. Severity: Blocker/Critical/Major/Minor (with justification)
3. User impact description (who is affected, what task fails)
4. Specific code fix recommendation
5. Regression test to add to prevent reintroduction
6. Whether this finding was likely caught by automated tools 
   (and if so, why it wasn't flagged)

Generating accessibility regression tests from AT findings

After findings are fixed, AI helps create regression tests that can be run without AT:

Finding 2 from our AT test was fixed: The payment card number input 
now has a proper <label> element with for/id association.

Finding 1 was fixed: Cart item removal now announces via an 
aria-live="polite" region.

Generate:
1. A Playwright accessibility regression test that verifies 
   Finding 2 is fixed (using axe-core integration or 
   explicit ARIA assertion)
2. A Playwright test that verifies Finding 1: after clicking 
   "Remove item", an ARIA live region with specific text 
   is present in the DOM
3. An axe-core custom rule configuration that would catch 
   Finding 2 (unlabeled card input) in future CI runs

These tests should run in CI and fail the build if the 
accessibility regressions are reintroduced.

Example AI-generated Playwright accessibility regression test:

import { test, expect } from '@playwright/test';
import AxeBuilder from '@axe-core/playwright';

test.describe('Checkout form accessibility regressions', () => {
  test('card number input has accessible label (Finding 2 regression)', 
    async ({ page }) => {
      await page.goto('/checkout/payment');

      // Verify label element exists and is associated
      const cardInput = page.locator('[data-testid="card-number-input"]');
      const inputId = await cardInput.getAttribute('id');

      // Check for associated <label> element
      const label = page.locator(`label[for="${inputId}"]`);
      await expect(label).toBeVisible();
      await expect(label).toHaveText(/card number/i);

      // Verify axe finds no violations on this element
      const axeResults = await new AxeBuilder({ page })
        .include('[data-testid="card-number-input"]')
        .analyze();

      expect(axeResults.violations).toHaveLength(0);
  });

  test('cart item removal announces to screen readers (Finding 1 regression)', 
    async ({ page }) => {
      await page.goto('/checkout/cart');

      // Click remove button for first item
      await page.locator('[data-testid="remove-item-btn"]').first().click();

      // Verify ARIA live region exists and contains announcement
      const liveRegion = page.locator('[aria-live="polite"]');
      await expect(liveRegion).toBeAttached();

      // Verify it contains some text (item was removed announcement)
      await expect(liveRegion).not.toBeEmpty();

      // More specific: verify it contains confirmation text
      await expect(liveRegion).toContainText(/removed/i);
  });
});

Learning Tip: The hardest part of accessibility audits is not finding violations — it is communicating them to developers in a way that results in correct fixes. "The screen reader doesn't announce the item removal" gets fixed with a visible toast notification that developers think is equivalent. "4.1.3 requires an aria-live region to announce status changes without moving focus — add <div aria-live='polite' aria-atomic='true'> and inject the confirmation message text into it via JavaScript after item removal" gets fixed correctly. Use AI to convert your AT findings into developer-actionable fix descriptions with exact code patterns. The specificity saves multiple review cycles.