What is persistent context and why does it matter for autonomous QA agents in CI/CD?
When a QA engineer opens a Claude Code session and starts an agentic task, they bring accumulated context: they know the project's test conventions, the domain model, the team's risk tolerance, and the history of past QA cycles. An autonomous CI/CD agent has none of this context unless it is explicitly provided through configuration files. Persistent context is the set of files and configurations that give a CI agent the same operating knowledge a human QA engineer would bring to a session.
Without persistent context, a CI/CD agent running on a new PR will:
- Generate tests in the wrong framework (it reads the spec but not the test stack)
- Use CSS selectors instead of data-testid (it invents its own convention)
- Re-discover domain rules that were already captured in previous sessions
- Produce analysis with generic risk labels instead of project-specific risk factors
- Miss the business rules that make certain test scenarios higher priority than others
With well-configured persistent context, the same agent will:
- Generate tests that are immediately runnable in the project's framework
- Follow the exact selector and assertion conventions in CLAUDE.md
- Apply domain-specific risk weights (e.g., "payment flow changes are always HIGH risk")
- Reference the project's traceability matrix and test plan history
- Produce coverage reports consistent with the team's reporting format
The Difference Between Session Context and Persistent Context
Session context is what you provide in an individual prompt — the feature spec, the code diff, the specific task. Persistent context is what is always available to the agent, regardless of the specific task. Persistent context is configured once and maintained as the project evolves.
| Context Type | Scope | Lifetime | Owner |
|---|---|---|---|
| Session context | One task | One session | QA engineer running the task |
| Persistent context | All tasks | Project lifetime | Team configuration |
Persistent context does not replace session context — it enriches it. The agent combines both to produce output that is both project-accurate (from persistent context) and task-specific (from session context).
Learning Tip: Audit your current CLAUDE.md by running a CI agent task without providing any session context at all — just the persistent context. If the agent produces usable output on a standard task (like a PR coverage check), your persistent context is good. If the output is generic, identify what domain knowledge the agent assumed incorrectly, and add that knowledge to your persistent context files. This blind test is the fastest way to find persistent context gaps.
How to set up CLAUDE.md, context files, and repo maps for CI agents?
A CI agent's persistent context consists of three layers: the CLAUDE.md file (project identity and conventions), supplementary context files (domain knowledge, test standards, historical artifacts), and a repo map (structure awareness). Each layer serves a different function.
Layer 1: The CI-Optimized CLAUDE.md
The CLAUDE.md used by CI agents has additional sections beyond the local development CLAUDE.md:
## Environment: CI/CD
This context is used by autonomous QA agents running in CI pipelines.
Agents should: analyze, generate drafts, and report.
Agents should NOT: push commits, merge PRs, or close tickets.
## Test Stack
- E2E framework: Playwright 1.40 (TypeScript)
- API testing: Jest + Supertest 6.x
- Unit tests: Vitest
- Test runner commands:
- E2E: `npx playwright test --reporter=json`
- API: `npx jest tests/api/ --json --outputFile=results.json`
- All: `npm run test:all`
## Test File Conventions
- E2E: `tests/e2e/{feature}.spec.ts` — one file per feature
- API: `tests/api/{resource}.test.ts` — one file per API resource
- Page Objects: `tests/e2e/pages/{Name}Page.ts` — camelCase class names
- Factories: `tests/fixtures/factories.ts` — all test data creation here
- Constants: `tests/fixtures/constants.ts` — env URLs, test account credentials
## Selector Strategy
ALWAYS use data-testid: `page.getByTestId('submit-button')`
NEVER use: CSS classes, XPath, text selectors, element IDs
Exception: semantic roles for accessibility tests only
## Domain Model (QA-critical entities)
- User roles: OWNER > ADMIN > MEMBER > VIEWER (permission hierarchy)
- Subscription states: TRIAL (14d) > ACTIVE > PAST_DUE > CANCELLED
- Payment methods: CARD, BANK_TRANSFER, INVOICE (INVOICE: Enterprise only)
- Order states: PENDING > PROCESSING > FULFILLED > REFUNDED | CANCELLED
- Address types: DOMESTIC, INTERNATIONAL, PO_BOX (PO_BOX: restricted features)
## Risk Weights (for test planning)
- Payment flow changes: always HIGH risk regardless of change size
- Authentication changes: always HIGH risk
- Permission/role changes: always HIGH risk
- UI rendering changes: MEDIUM by default, LOW if no business logic
- Copy/content changes: LOW unless regulatory/legal content
## Historical QA Context
- Known flaky tests: see qa-artifacts/known-flaky-tests.md
- Pre-existing bugs: see qa-artifacts/known-issues.md (exclude from new bug reports)
- Test plan templates: see qa-artifacts/templates/
- Previous coverage reports: see qa-artifacts/coverage-reports/
## Output Conventions for CI Reports
- Use this header for all CI reports: `## AI QA Report — {feature} — {date}`
- Risk classifications: **HIGH** / **MEDIUM** / **LOW** (bold, exact case)
- Bug severity: CRITICAL / HIGH / MEDIUM / LOW (no bold)
- Always end reports with: `### Next Steps` section listing 3-5 specific actions
## What NOT to Do
- Do not generate test data with real email addresses or PII
- Do not add `await page.waitForTimeout()` or `setTimeout()` — use waitFor
- Do not mock the test database — use the test DB instance
- Do not reference CI environment variables in test files (use constants.ts)
Layer 2: Supplementary Context Files
Create a qa-context/ directory with domain-specific knowledge files:
qa-context/
├── domain-model.md # Complete entity relationships and business rules
├── api-contracts.md # Summary of all API endpoints, methods, schemas
├── test-conventions.md # Extended test writing standards with examples
├── known-issues.md # Pre-existing bugs to exclude from new reports
├── known-flaky-tests.md # Flaky tests and their root causes
├── risk-taxonomy.md # Project-specific risk categories and weights
└── page-object-catalog.md # List of all Page Objects, methods, and selectors
Example qa-context/page-object-catalog.md:
## CheckoutPage (`tests/e2e/pages/CheckoutPage.ts`)
**Methods**:
- `fillAddressForm(address: AddressFixture)` — fills all address fields
- `submitAddress()` — clicks "Continue to payment" button
- `getAddressError(field: 'zip' | 'city' | 'state' | 'street')` — returns error text
- `getAddressValidationState(field)` — returns 'valid' | 'invalid' | 'pending'
- `waitForValidation()` — waits for all validation to complete
**Fixtures** (from factories.ts):
- `createValidDomesticAddress()` — returns valid US address
- `createInvalidZipAddress()` — returns address with invalid zip
- `createPOBoxAddress()` — returns PO Box address
When the agent reads this catalog, it generates tests that call real methods on real Page Objects — not invented methods that don't exist.
Layer 3: The Repo Map
For large codebases, create a repo map that tells the agent which files are relevant for QA tasks:
## Source Files of Interest for QA
- src/components/checkout/ — Checkout UI components
- src/services/address/ — Address validation service
- src/api/routes/checkout.ts — Checkout API endpoints
- src/middleware/auth.ts — Authentication middleware (HIGH risk)
- src/models/ — Database models (always relevant for data integrity tests)
## Test Infrastructure
- tests/setup.ts — Global test setup (runs before each test suite)
- tests/teardown.ts — Global teardown
- tests/fixtures/ — All test data and factories
- tests/e2e/pages/ — All Page Object Models
## QA Artifacts Location
- qa-artifacts/test-plans/ — All test plans
- qa-artifacts/coverage-reports/ — Historical coverage reports
- qa-artifacts/manual-tests/ — Manual test case files
- qa-artifacts/bugs/ — Bug reports from exploratory and scripted testing
## CI/CD Configuration
- .github/workflows/ — All GitHub Actions workflows
- playwright.config.ts — Playwright configuration (base URL, retries, timeouts)
- jest.config.ts — Jest/API test configuration
Configuring Context Auto-Loading in CI
The CI job configuration reads context files automatically:
- name: Load QA persistent context
run: |
# Concatenate all context files into a single context payload
cat CLAUDE.md > /tmp/qa-persistent-context.md
echo "\n\n" >> /tmp/qa-persistent-context.md
for file in qa-context/*.md; do
echo "## Context: $file\n" >> /tmp/qa-persistent-context.md
cat "$file" >> /tmp/qa-persistent-context.md
echo "\n\n" >> /tmp/qa-persistent-context.md
done
echo "Persistent context size: $(wc -c < /tmp/qa-persistent-context.md) bytes"
Learning Tip: Keep your
qa-context/known-issues.mdfile updated. This is the single highest-value context file for CI agents — it prevents the agent from flagging pre-existing bugs as new findings and polluting your coverage reports. After every sprint, spend 10 minutes updating this file: close resolved issues, add new known issues. An outdated known-issues.md will cause the agent to re-report the same bugs every sprint, which erodes team trust in the automated reports.
How to trigger agentic QA runs automatically on pull requests and feature branches?
Agentic QA runs should trigger automatically on events that indicate new testable work has arrived: PR creation, PR updates (new commits), and specific branch patterns for feature work. The trigger configuration determines what QA runs automatically without human intervention.
PR-Based Trigger Configuration
name: Agentic QA — PR Coverage Analysis
on:
pull_request:
types: [opened, synchronize, ready_for_review]
# Exclude draft PRs — only run when PR is ready for review
# Note: 'ready_for_review' handles the draft -> ready transition
jobs:
agentic-qa-analysis:
runs-on: ubuntu-latest
# Only run if not a draft PR
if: github.event.pull_request.draft == false
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history needed for accurate diff
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Install Claude Code
run: npm install -g @anthropic-ai/claude-code
- name: Prepare PR context
run: |
# Capture the full diff
git diff origin/${{ github.base_ref }}...HEAD > pr-diff.txt
# Capture PR metadata
echo "PR_TITLE: ${{ github.event.pull_request.title }}" > pr-meta.txt
echo "PR_AUTHOR: ${{ github.event.pull_request.user.login }}" >> pr-meta.txt
echo "BASE_BRANCH: ${{ github.base_ref }}" >> pr-meta.txt
echo "CHANGED_FILES: $(git diff origin/${{ github.base_ref }}...HEAD --name-only | wc -l)" >> pr-meta.txt
- name: Run agentic QA analysis
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
# Load persistent context
cat CLAUDE.md qa-context/*.md > /tmp/persistent-context.md
claude --print "
$(cat /tmp/persistent-context.md)
---
You are performing an automated QA analysis for a new pull request.
PR METADATA:
$(cat pr-meta.txt)
CODE CHANGE:
$(cat pr-diff.txt)
EXISTING QA ARTIFACTS (for context):
$(ls qa-artifacts/ 2>/dev/null | head -20)
Perform the following analysis:
## 1. Change Impact Summary
Describe what this PR changes functionally (2-3 sentences).
## 2. Risk Assessment
List HIGH, MEDIUM, LOW risk areas based on the diff.
Apply the domain-specific risk weights from CLAUDE.md.
## 3. Test Coverage Check
- Which existing tests cover the changed code paths?
- Which changed paths have no existing test coverage?
- Is the existing coverage sufficient for this change?
## 4. Required Actions Before Merge
List specific QA tasks that must complete before this PR is safe to merge.
Distinguish: must-do (blocking) vs. should-do (non-blocking).
## 5. Automation Recommendation
Recommend 2-3 specific automated tests to add for this change.
Reference the project's test conventions from CLAUDE.md.
Output format: structured markdown suitable for a PR comment.
Use the standard report header from CLAUDE.md.
" > agentic-qa-report.md
- name: Post analysis as PR comment
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const report = fs.readFileSync('agentic-qa-report.md', 'utf8');
// Find and update existing QA comment, or create new one
const comments = await github.rest.issues.listComments({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo
});
const existingComment = comments.data.find(c =>
c.body.includes('## AI QA Report') && c.user.type === 'Bot'
);
if (existingComment) {
await github.rest.issues.updateComment({
comment_id: existingComment.id,
owner: context.repo.owner,
repo: context.repo.repo,
body: report
});
} else {
await github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: report
});
}
Feature Branch Trigger for Longer QA Cycles
For branches following a naming convention (feature/*, qa/*), trigger more comprehensive QA runs:
name: Agentic QA — Feature Branch Full Cycle
on:
push:
branches:
- 'feature/**'
- 'qa/**'
paths:
- 'src/**' # Only when source changes, not docs/config
workflow_dispatch: # Manual trigger for on-demand runs
inputs:
target_branch:
description: 'Branch to analyze'
required: true
default: 'current'
jobs:
full-agentic-qa-cycle:
runs-on: ubuntu-latest
steps:
# [checkout, setup steps as above]
- name: Generate test plan
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
FEATURE_NAME=$(echo "${{ github.ref_name }}" | sed 's/feature\///')
claude --print "
$(cat CLAUDE.md)
Generate a complete test plan for this feature branch.
CODE CHANGE:
$(git diff origin/main...HEAD)
FEATURE BRANCH: ${{ github.ref_name }}
Output a structured test plan following the format in CLAUDE.md.
" > "qa-artifacts/auto-test-plan-${FEATURE_NAME}-$(date +%Y%m%d).md"
- name: Run existing automated tests
run: |
npx playwright test --reporter=json --output=test-results/ || true
npx jest --json --outputFile=api-results.json || true
- name: Generate coverage analysis
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
claude --print "
$(cat CLAUDE.md)
Analyze test execution results against the generated test plan.
TEST PLAN:
$(cat qa-artifacts/auto-test-plan-*.md | tail -1)
E2E RESULTS:
$(cat test-results/*.json 2>/dev/null | head -200)
API RESULTS:
$(cat api-results.json 2>/dev/null | head -200)
Generate a coverage report and go/no-go recommendation.
" > qa-artifacts/coverage-report-$(date +%Y%m%d).md
Learning Tip: Start CI triggers conservatively: PR-only, read-only (analysis and commenting, no file creation). After two weeks of consistently useful output, add the feature branch trigger. After one more month of trust-building, add the automated test generation step. Rushing to full automation on day one causes a trust breakdown when the first bad output appears. The gradual trust-building approach means that when a bad output does appear, the team has enough positive history to treat it as an exception rather than evidence that the whole system is unreliable.
How to review and act on autonomous agent outputs surfaced in CI?
Autonomous agent outputs in CI are not decisions — they are recommendations. The review-and-act workflow defines how QA engineers engage with these recommendations without being overwhelmed by them.
The Three-Tier Output Review Model
Not all CI agent outputs require the same level of review attention:
Tier 1 — Automated pass-through (no review needed):
- CI coverage report shows all HIGH-risk scenarios passing
- Risk assessment is LOW across all changed files
- No new test coverage gaps identified
Action: Add a thumbs-up reaction to the PR comment and proceed to code review.
Tier 2 — Spot review required (5–10 minutes):
- MEDIUM-risk changes with coverage gaps identified
- 1–2 new scenarios flagged that don't have automated coverage
- No blocking issues, but some recommended additions
Action: Read the report, evaluate the recommendations, create tickets for suggested test additions.
Tier 3 — Full review required (20–30 minutes):
- HIGH-risk changes flagged by the agent
- New failure in an existing automated test
- Spec-implementation alignment gap detected
- Agent cannot confidently classify a finding (explicitly flagged as "uncertain")
Action: Full review of the report, follow-up with developer if needed, make explicit go/no-go decision.
The Review Workflow for Tier 3 Outputs
1. Read the risk assessment section first — if HIGH risk is flagged, stop here
and understand exactly what risk is described before reading anything else.
2. Read the coverage gap section — are the gaps blocking (HIGH risk scenarios
with no coverage) or non-blocking (MEDIUM/LOW scenarios)?
3. Read the spec-implementation alignment section — if misalignments are found,
consult the developer before proceeding. These are the most important findings.
4. Read the required actions section — prioritize blocking actions.
5. Create tickets for non-blocking actions (don't let them fall through).
6. Make the go/no-go decision explicitly and record it as a PR comment response.
The Go/No-Go Response Template
## QA Review Response
**Reviewed by**: [QA engineer name]
**Review date**: [date]
**Decision**: GO / NO-GO / CONDITIONAL GO
**Rationale**:
[Brief explanation of the decision]
**Blocking issues** (NO-GO only):
- [Issue 1]
- [Issue 2]
**Conditions for CONDITIONAL GO**:
- [Condition 1 must be met before merge]
- [Condition 2 can be addressed in follow-up ticket]
**Follow-up tickets created**:
- [Ticket ID]: [description]
- [Ticket ID]: [description]
**AI report accuracy**: Accurate / Partially accurate / Inaccurate
[Note any AI misclassifications for prompt improvement]
The AI report accuracy field is a feedback mechanism. When QA engineers consistently mark reports as "partially accurate" on a specific type of finding, that is a signal that the persistent context needs updating to improve agent accuracy on that class of change.
Learning Tip: Track your go/no-go decisions and correlate them with post-merge defect data. After six months of the agentic CI workflow, answer: how many times did the agent flag HIGH risk on a PR that later had a production defect? How many times did it miss a HIGH-risk change that produced a defect? This analysis tells you exactly how well-calibrated your persistent context is and gives you concrete data to use when advocating for investment in context engineering quality.