How to Generate WCAG 2.1/2.2 Compliance Test Cases with AI?
Accessibility testing has historically been treated as a specialist discipline separate from mainstream QA — the province of dedicated accessibility consultants or automated scanners run as an afterthought. This separation is expensive and ineffective. WCAG 2.1 has 78 success criteria across three conformance levels. WCAG 2.2 adds 9 more. No automated scanner covers more than 30–40% of these criteria. The remaining 60–70% require human judgment and structured manual testing — which is exactly where AI-assisted test case generation delivers the most value.
Understanding the WCAG testing coverage gap
Before you can effectively use AI to generate WCAG test cases, you need to understand what automated tools cover and what they miss:
| Testing method | WCAG criteria coverage | What it catches |
|---|---|---|
| axe-core / Deque | ~30–35% | Missing alt text, missing form labels, color contrast failures, ARIA misuse |
| Lighthouse accessibility | ~25–30% | Similar to axe, different scoring |
| Manual testing with screen reader | ~60–75% | All automated + focus order, screen reader announcements, interaction patterns |
| Manual testing with keyboard only | ~50–60% | All keyboard navigation criteria |
| Combined automated + manual | ~85–90% | Near-complete WCAG coverage |
The gap between automated and manual coverage is where AI-generated test cases add the most value — they give manual testers a structured checklist rather than relying on tester experience and memory.
Generating WCAG 2.1 test cases for a specific component
I need to generate manual accessibility test cases for WCAG 2.1 Level AA
compliance for a multi-step checkout form in a web application.
## Component Description
The checkout form has three steps:
1. Cart review: List of items, quantities (editable), remove buttons, subtotal
2. Shipping: Address fields, shipping method radio buttons, "Save address" checkbox
3. Payment: Credit card number, expiry, CVV, billing address (checkbox to
copy shipping), submit button
After submission:
- Loading state: spinner replaces submit button, form fields disabled
- Success: Redirect to order confirmation page
- Error: Error message appears above the form, focus moves to the error
## Technology
React SPA, custom CSS components, no UI library (no built-in ARIA support)
Generate manual test cases for the following WCAG 2.1 AA criteria:
- 1.1.1 Non-text Content (alt text on any images/icons)
- 1.3.1 Info and Relationships (semantic structure)
- 1.3.5 Identify Input Purpose (autocomplete attributes)
- 1.4.3 Contrast Minimum (text contrast)
- 1.4.11 Non-text Contrast (input border, focus indicator contrast)
- 2.1.1 Keyboard accessibility (all functionality via keyboard)
- 2.4.3 Focus Order (logical tab sequence)
- 2.4.7 Focus Visible (visible focus indicator)
- 3.3.1 Error Identification (error messages)
- 3.3.2 Labels or Instructions (form labels)
- 4.1.2 Name, Role, Value (ARIA on custom components)
- 4.1.3 Status Messages (WCAG 2.1 — screen reader announcements)
For each test case:
1. Test case ID (e.g., WCAG-1.1.1-001)
2. Success criterion tested
3. Specific step in the checkout where this applies
4. Test steps (specific actions to take)
5. Pass condition (what correct implementation looks like)
6. Fail condition (what a violation looks like)
7. Testing method (screen reader, keyboard, browser inspector,
contrast analyzer tool)
8. Notes on common implementation mistakes for this criterion
Generating WCAG 2.2 additions test cases
WCAG 2.2 introduced new criteria that specifically address mobile and cognitive accessibility. AI can generate tests for these newer criteria:
Generate test cases for the WCAG 2.2 criteria that are NEW
(not in WCAG 2.1) at Level AA:
- 2.4.11 Focus Not Obscured (Minimum)
- 2.4.12 Focus Not Obscured (Enhanced) — AA+
- 2.5.3 Label in Name (moved to AA in 2.2)
- 2.5.7 Dragging Movements
- 2.5.8 Target Size (Minimum)
- 3.2.6 Consistent Help
- 3.3.7 Redundant Entry
- 3.3.8 Accessible Authentication (Minimum)
- 3.3.9 Accessible Authentication (Enhanced)
For each criterion, apply it to our checkout form component described
above and produce:
1. A plain-language explanation of what the criterion requires
in the context of our specific component
2. Test steps to verify compliance
3. Common failure patterns for React SPAs specifically
4. The code pattern or CSS fix that resolves the most common failure
Pay particular attention to 3.3.8 Accessible Authentication —
explain what our payment form must do to comply, and what
common implementations violate it.
Building a WCAG test matrix for a full application
For a broader application audit, AI can generate a prioritized test matrix:
I need to prioritize WCAG 2.1/2.2 AA testing for our application given
a 3-day testing window with 2 QA engineers.
## Application Overview
- 8 main page types: Homepage, Product listing, Product detail,
Cart, Checkout (3 steps), Order confirmation, Account, Support chat
- User base: B2C, expected 5–8% of users have accessibility needs
- Legal exposure: US-based company, ADA Title III applies;
EU expansion planned (EN 301 549)
- Previous audit: Last audit was 18 months ago, failed on 2.4.3,
2.1.1, and 3.3.1
Produce a prioritized testing plan that:
1. Ranks the 8 page types by accessibility risk
(legal exposure × traffic volume × interaction complexity)
2. For the top 4 page types, identify the 10 most critical
WCAG criteria to test first
3. Recommend which criteria to cover with automated tools (axe, Lighthouse)
vs. which require manual testing
4. Estimate testing time per page type per criterion
5. Identify any criteria from the previous audit failures that
require regression testing regardless of priority
6. Produce a one-page test execution matrix with pass/fail/not-tested
columns for each page × criterion combination
Learning Tip: WCAG criterion 4.1.3 (Status Messages, introduced in 2.1) is one of the most commonly missed by both automated tools and manual testers who don't use screen readers regularly. It requires that loading states, success messages, and error notifications be announced to screen reader users without moving keyboard focus. Ask AI specifically: "For each dynamic state change in this component (loading, success, error, validation), what ARIA live region implementation is required for 4.1.3 compliance?" Most developers implement visible status messages correctly but forget the screen reader announcement.
How to Use AI to Generate GDPR and Data Privacy Test Scenarios?
GDPR testing is one of the most underspecified areas in QA. Most teams treat compliance as a legal and architecture concern, not a test execution concern. But GDPR compliance has observable, testable behaviors: data subject rights must be implemented, consent must be recorded and respected, data retention limits must be enforced. QA engineers who can generate and execute these test scenarios provide measurable compliance assurance — not just theoretical declarations.
The testable behaviors in GDPR
Not all of GDPR is directly testable by QA. But these articles have observable, verifiable behaviors in a web application:
| GDPR Article | Testable behavior | QA test type |
|---|---|---|
| Art. 17 — Right to Erasure | User can delete account; data is actually removed | API + DB verification |
| Art. 20 — Right to Portability | User can export their data in machine-readable format | API + data format validation |
| Art. 15 — Right of Access | User can view all data held about them | API response completeness check |
| Art. 7 — Consent | Consent is recorded; withdrawal stops processing | UI + API + audit log check |
| Art. 25 — Data Minimisation | API responses don't return unnecessary PII | Response field audit |
| Art. 32 — Security | Data in transit encrypted; sensitive fields masked | Network + API response check |
| Art. 13/14 — Transparency | Privacy notice is accessible and accurate | Content + link verification |
Generating GDPR test scenarios with AI
Generate GDPR compliance test scenarios for our user account management
system. This is an authorized internal compliance test for our own application.
## System Context
- User data stored: email, name, date of birth, phone, shipping addresses,
order history, browsing history, payment method metadata (last 4 only,
no full card numbers)
- Consent types recorded: Marketing emails, analytics tracking,
third-party data sharing
- Data subject rights portal: Available in account settings
- Data retention: Active accounts indefinitely; deleted accounts —
personal data purged after 30 days, order records retained 7 years
(legal/tax requirement)
## Scenarios Needed
Generate test scenarios for these GDPR rights:
1. Right to Erasure (Art. 17):
- User requests account deletion
- Verify PII is purged within 30 days
- Verify order records are retained (legal basis)
- Verify all sessions are invalidated immediately
- Verify user cannot log back in after deletion request
2. Right to Data Portability (Art. 20):
- User requests data export
- Verify export contains all categories of personal data
- Verify format is machine-readable (JSON or CSV)
- Verify no additional data about other users is included
- Verify export is delivered securely (not emailed in plaintext)
3. Consent Withdrawal (Art. 7):
- User withdraws marketing email consent
- Verify future marketing emails are not sent
- Verify historical consent record is preserved (audit trail)
- Verify consent withdrawal doesn't affect non-marketing emails
(transactional emails still sent)
4. Data Minimisation (Art. 25):
- Verify API responses for product listings don't include PII
- Verify recommendations API doesn't expose other users' behavior
- Verify admin API responses mask sensitive fields for
lower-privilege admin roles
For each scenario:
1. Preconditions (test data setup required)
2. Test steps (specific actions and API calls)
3. Verification method (API response check, DB query, email
system check, audit log check)
4. Pass criteria
5. Time-sensitive criteria (e.g., "within 30 days" — how to test
this without waiting 30 days in staging)
Handling time-sensitive GDPR test cases
GDPR timelines (30-day erasure, 72-hour breach notification) can't be tested by waiting. AI can generate workaround test strategies:
Several of our GDPR test scenarios involve time-based verification
that we can't test in real-time. For each scenario, suggest a
testable workaround:
1. Account deletion: PII must be purged within 30 days
- How to test this without waiting 30 days in staging?
2. Data export delivery: Must be delivered within 1 month of request
- How to test the delivery mechanism without waiting?
3. Consent effectiveness: After marketing consent withdrawal,
no marketing emails within the next campaign cycle
- How to verify this without waiting for the next marketing batch?
4. Session invalidation after deletion request: All existing sessions
must be invalidated immediately
- How to test "immediately" in an automated test?
For each: suggest (a) a mechanism to manipulate time in staging,
(b) a unit-level test that verifies the behavior without end-to-end timing,
or (c) a configuration test that confirms the scheduled jobs are
configured correctly.
Generating CCPA and cross-regulation test cases
Our application must comply with both GDPR (EU users) and CCPA
(California users). Generate a comparison test matrix that:
1. Identifies where GDPR and CCPA requirements overlap
(test once, covers both)
2. Identifies CCPA-specific requirements that need additional tests
beyond GDPR:
- "Do Not Sell" opt-out
- CCPA-specific disclosure requirements
- Different category definitions for personal information
3. Identifies GDPR-specific requirements that have no CCPA equivalent
4. Recommends the minimum test set that achieves dual compliance
verification coverage
Our user base: 70% EU (GDPR), 15% California (CCPA), 15% other US states
Learning Tip: The most common failure in GDPR test execution is verifying only the front-end behavior without confirming the back-end action. A "delete account" button that shows a success message but doesn't actually purge data from the database is a GDPR violation — and the only way to catch it is to verify the database state after the operation. Always include database-level verification steps in your GDPR test cases, and make sure your QA team has the access and tooling to execute them in staging. If direct DB access isn't available, work with the development team to expose a compliance verification endpoint in staging that confirms deletion status.
How to Generate and Track Regulatory Compliance Checklists with AI?
Accessibility and privacy are just two compliance domains. Depending on your application type, you may also need to test for PCI-DSS (payment card), SOC 2 Type II (security controls), HIPAA (healthcare), ISO 27001 (information security), or FedRAMP (US government cloud). AI can generate checklists for all of these — and, critically, help you distinguish between compliance requirements that QA testing can verify and those that require infrastructure audits, policy reviews, or certification processes.
Generating a QA-testable compliance subset from a standard
Not all compliance requirements are test-executable. AI helps you filter:
I need to create a QA testing compliance checklist for our SaaS
application from PCI-DSS v4.0 requirements.
Important constraint: Only include requirements that a QA engineer
can verify through application testing (API testing, UI testing,
network testing). Exclude infrastructure requirements that require
system administrator access, policy requirements that require
document review, and certification requirements that require
third-party assessors.
## Our Application's Payment Context
- We use Stripe as payment processor (SAQ A eligible — we never
touch card data directly)
- We display the last 4 digits of saved cards
- We store billing addresses
- We have an admin panel where support staff can view masked
payment method info
Generate a QA-testable checklist for:
1. Requirements 6.x (Secure software development — testing for
vulnerabilities)
2. Requirements 7.x (Access control — testing authorization)
3. Requirements 8.x (Authentication — testing login security)
4. Requirements 10.x (Logging — testing audit trail)
For each requirement:
1. The PCI-DSS requirement ID and plain-language description
2. The specific QA test that verifies compliance
3. The expected pass evidence (what you observe when compliant)
4. Severity if failed (critical for PCI scope, major, minor)
5. Whether this is testable in isolation or requires a specific
environment state
Generating SOC 2 Type II control verification tests
Our application is pursuing SOC 2 Type II certification.
The auditor will verify that controls operate effectively over
a 12-month period. As QA, I need to generate tests that verify
the following SOC 2 Common Criteria controls are implemented:
CC6.1 — Logical and Physical Access Controls (authentication, authorization)
CC6.3 — Role-based access removal for terminated personnel
CC6.7 — Transmission and encryption controls
CC7.2 — System monitoring and anomaly detection
CC8.1 — Change management process controls
CC9.2 — Risk assessment and vendor management
For each control, generate:
1. The specific application behavior that demonstrates the control
2. A test to verify that behavior in staging
3. Evidence the auditor would need to see (screenshot, API response,
log entry, or configuration export)
4. How frequently this should be re-verified (one-time, per release,
per quarter)
CC6.3 specific: Our application supports SSO via Okta.
When a user is deprovisioned in Okta, their application access
should be revoked immediately. Generate a specific test for this
with exact steps.
Building a compliance test tracking system prompt
AI can also help you design the tracking structure, not just the test cases:
Design a compliance test tracking structure for a QA team that
manages multiple regulatory standards simultaneously (WCAG, GDPR,
PCI-DSS, SOC 2).
The tracking structure should:
1. Link each test case to one or more regulatory requirements
(a single test can cover GDPR Art. 17 AND SOC 2 CC6.1)
2. Track test execution date, tester, environment, and result
3. Flag when a test hasn't been re-run after a significant code change
4. Generate a compliance coverage report showing % of
requirements tested, % passed, and last verified date
5. Support generating a "compliance evidence package" for auditors:
a document listing requirement → test → evidence → result
Output:
1. A proposed data structure (JSON or CSV schema) for tracking this
2. A template for the compliance evidence package document
3. Recommended re-testing frequency for each standard
4. A prioritization rule for when tests conflict with sprint capacity
Learning Tip: Compliance checklists generated by AI need a domain-specific review before you run them. AI knows GDPR, PCI-DSS, and WCAG in their published forms — but it does not know your organization's specific implementation decisions, scope limitations (like being SAQ A eligible for PCI), or which controls you've implemented at the infrastructure level vs. application level. After generating a compliance checklist with AI, always have it reviewed by your legal/compliance team or a compliance SME before treating it as authoritative. Use AI to generate the 80% — bring a human expert for the remaining 20% that requires interpretation.
How to Combine AI Analysis with Manual Verification in Accessibility Audits?
Automated accessibility scanning and AI-generated test cases are not substitutes for manual testing with assistive technologies. Screen readers, keyboard-only navigation, and zoom testing reveal failure modes that no automated tool or AI can simulate — because they require actual user experience to detect. The professional approach combines all three layers: automated tools for quick coverage, AI-generated test cases for structured manual coverage, and direct assistive technology testing for real-world verification.
The three-layer accessibility audit model
Layer 1: Automated scanning (axe-core, Lighthouse, Pa11y)
- Coverage: ~30-35% of WCAG criteria
- Time: Minutes per page
- Output: Definitive violations with code references
- Role: Catch unambiguous issues before manual testing
Layer 2: AI-generated structured manual testing
- Coverage: Additional 40-50% of WCAG criteria
- Time: 2-4 hours per page type
- Output: Structured pass/fail per criterion
- Role: Systematic coverage of criteria that require human judgment
Layer 3: Assistive technology testing (NVDA, JAWS, VoiceOver, TalkBack)
- Coverage: 15-25% of criteria that require actual user experience
- Time: 4-8 hours per user journey
- Output: Qualitative findings + specific failure descriptions
- Role: Verify real-world usability, not just technical compliance
Using AI to plan the assistive technology testing phase
I have completed automated accessibility scanning (axe-core) and
AI-generated structured manual testing for our checkout flow.
Here are the remaining gaps that require assistive technology testing:
Automated scan passed: 1.1.1, 1.4.3, 4.1.2 (basic ARIA)
Manual structured testing passed: 2.1.1, 2.4.3, 2.4.7, 3.3.1, 3.3.2
Still needs AT verification: 4.1.3, 2.4.11, 3.3.8, 1.3.1 (screen reader
reading order), 2.1.3 (no keyboard trap in payment iframe)
Plan a 4-hour assistive technology testing session using:
- NVDA + Chrome on Windows 10 (representing most screen reader users)
- VoiceOver + Safari on iOS 16 (mobile screen reader)
- Keyboard-only navigation on Chrome (no mouse)
For each remaining criterion:
1. Which AT combination to use for testing
2. Exact test steps using that AT (e.g., "Tab to the card number
field — NVDA should announce: 'Card number, required, edit text'")
3. What announcement or behavior to listen for
4. Common failure pattern to watch for
5. How to document the finding if it fails
(what screen recording + description captures the evidence)
Also generate a quick-reference card for the tester:
- NVDA keyboard shortcuts needed for this test
- VoiceOver gestures needed for this test
- How to navigate the checkout form steps using keyboard only
Integrating AI analysis of AT test findings
After AT testing, use AI to help prioritize and classify findings:
Here are the findings from our 4-hour assistive technology testing
session on the checkout flow. Classify and prioritize each finding.
Finding 1:
Screen reader (NVDA): When the "Remove item" button is activated in
the cart, NVDA announces nothing — the item disappears visually but
the screen reader user has no confirmation the action completed.
Criterion: 4.1.3 Status Messages
Observed: No ARIA live region announcement after cart item removal
Finding 2:
Screen reader (NVDA): The credit card number input field is announced
as "Edit text" with no label — the visible label "Card number *" is
not programmatically associated with the input.
Criterion: 1.3.1 Info and Relationships / 4.1.2 Name, Role, Value
Observed: Input has placeholder "1234 5678 9012 3456" but no aria-label
or associated <label> element
Finding 3:
Keyboard only: Tab order on the shipping form jumps from "First Name"
to "Country" (skipping Last Name, Address Line 1, Address Line 2, City,
State, Zip) before returning to those fields.
Criterion: 2.4.3 Focus Order
Observed: Logical reading order is broken by CSS grid layout
Finding 4:
Mobile (VoiceOver + Safari): The 3-step progress indicator at the top
of the checkout is not announced to VoiceOver at all — sighted users
see "Step 2 of 3: Shipping" but VoiceOver users have no equivalent.
Criterion: 1.3.1 Info and Relationships
Observed: Progress steps implemented as CSS-styled divs with no ARIA
For each finding:
1. WCAG 2.1/2.2 criterion(s) violated
2. Severity: Blocker/Critical/Major/Minor (with justification)
3. User impact description (who is affected, what task fails)
4. Specific code fix recommendation
5. Regression test to add to prevent reintroduction
6. Whether this finding was likely caught by automated tools
(and if so, why it wasn't flagged)
Generating accessibility regression tests from AT findings
After findings are fixed, AI helps create regression tests that can be run without AT:
Finding 2 from our AT test was fixed: The payment card number input
now has a proper <label> element with for/id association.
Finding 1 was fixed: Cart item removal now announces via an
aria-live="polite" region.
Generate:
1. A Playwright accessibility regression test that verifies
Finding 2 is fixed (using axe-core integration or
explicit ARIA assertion)
2. A Playwright test that verifies Finding 1: after clicking
"Remove item", an ARIA live region with specific text
is present in the DOM
3. An axe-core custom rule configuration that would catch
Finding 2 (unlabeled card input) in future CI runs
These tests should run in CI and fail the build if the
accessibility regressions are reintroduced.
Example AI-generated Playwright accessibility regression test:
import { test, expect } from '@playwright/test';
import AxeBuilder from '@axe-core/playwright';
test.describe('Checkout form accessibility regressions', () => {
test('card number input has accessible label (Finding 2 regression)',
async ({ page }) => {
await page.goto('/checkout/payment');
// Verify label element exists and is associated
const cardInput = page.locator('[data-testid="card-number-input"]');
const inputId = await cardInput.getAttribute('id');
// Check for associated <label> element
const label = page.locator(`label[for="${inputId}"]`);
await expect(label).toBeVisible();
await expect(label).toHaveText(/card number/i);
// Verify axe finds no violations on this element
const axeResults = await new AxeBuilder({ page })
.include('[data-testid="card-number-input"]')
.analyze();
expect(axeResults.violations).toHaveLength(0);
});
test('cart item removal announces to screen readers (Finding 1 regression)',
async ({ page }) => {
await page.goto('/checkout/cart');
// Click remove button for first item
await page.locator('[data-testid="remove-item-btn"]').first().click();
// Verify ARIA live region exists and contains announcement
const liveRegion = page.locator('[aria-live="polite"]');
await expect(liveRegion).toBeAttached();
// Verify it contains some text (item was removed announcement)
await expect(liveRegion).not.toBeEmpty();
// More specific: verify it contains confirmation text
await expect(liveRegion).toContainText(/removed/i);
});
});
Learning Tip: The hardest part of accessibility audits is not finding violations — it is communicating them to developers in a way that results in correct fixes. "The screen reader doesn't announce the item removal" gets fixed with a visible toast notification that developers think is equivalent. "4.1.3 requires an aria-live region to announce status changes without moving focus — add
<div aria-live='polite' aria-atomic='true'>and inject the confirmation message text into it via JavaScript after item removal" gets fixed correctly. Use AI to convert your AT findings into developer-actionable fix descriptions with exact code patterns. The specificity saves multiple review cycles.