Setting up an AI-Assisted QA Environment

How to Install and Configure Claude Code and Gemini for QA Workflows

Before you can use AI agents for real QA work, you need both tools installed, authenticated, and validated. This section walks you through getting Claude Code and Gemini CLI production-ready — not just installed, but configured for QA-specific use.

Prerequisites

Confirm these are installed before proceeding:

node --version    # Requires Node.js 18+
npm --version     # Comes with Node.js
git --version     # Needed for repo-aware agent work

If node is below 18, update via nodejs.org or use a version manager:

nvm install 20
nvm use 20

Installing Claude Code

npm install -g @anthropic-ai/claude-code

claude --version

Authentication: Claude Code requires an Anthropic API key. You can authenticate in two ways:

claude

export ANTHROPIC_API_KEY="sk-ant-..."

On first run, claude will open a browser window for API key setup if no key is present. For team environments, use a shared service account API key stored in your secrets manager (AWS Secrets Manager, GitHub Secrets, 1Password).

Verify Claude Code is working:

cd your-test-repository

claude --print "List the test files in this project and describe what each covers in one sentence"

Expected output: A concise summary of your test files. If you see this, the agent is reading your codebase correctly.

Installing Gemini CLI

npm install -g @google/gemini-cli

gemini --version

Authentication: Gemini CLI authenticates via Google Cloud credentials:

gemini auth login

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

export GEMINI_API_KEY="..."

Verify Gemini is working:

cd your-test-repository

gemini --prompt "List the test files in this project and describe their purpose"

Configuring Model Defaults

Both tools let you specify the model. For QA workflows, prefer the most capable models:

claude --model claude-opus-4-6 "your task"

echo 'export CLAUDE_MODEL="claude-opus-4-6"' >> ~/.zshrc

gemini --model gemini-2.0-pro "your task"

For CI/CD workflows where cost matters, use faster/cheaper models (Claude Haiku, Gemini Flash) for lightweight tasks like selector validation or basic test generation.

Learning Tip: Install both tools even if you plan to primarily use one. Different models have different strengths on different tasks — you'll quickly find that Claude Code and Gemini produce different quality output on the same prompt, and having both available lets you cross-validate output on important tasks.

How to Connect AI Tools to Your Test Repositories and CI/CD Pipeline

Having agents installed locally is step one. Making them part of your team's workflow requires connecting them to your repositories and CI/CD infrastructure.

Connecting to Your Test Repository

The most important step is giving the agent proper access to your codebase. Both Claude Code and Gemini CLI work by reading files from your current directory. Always run them from the root of your project or test repository:

cd /path/to/your/project
claude "Review the test coverage for the PaymentService module"

Repository structure awareness: Agents are significantly more effective when your test files are clearly named and organized. If your project has a non-standard structure, describe it in your context file (covered in the next section).

Adding to GitHub Actions

Create a GitHub Actions workflow that runs Claude Code on pull requests:

name: AI QA Coverage Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  ai-coverage-analysis:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for diff analysis

      - name: Setup Node
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install Claude Code
        run: npm install -g @anthropic-ai/claude-code

      - name: Get PR Diff
        id: diff
        run: |
          git diff origin/${{ github.base_ref }}...HEAD > pr-diff.txt
          echo "diff_file=pr-diff.txt" >> $GITHUB_OUTPUT

      - name: Run AI Coverage Analysis
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          claude --print "
          You are a QA coverage analyst. 
          Review the following PR diff and:
          1. List the changed code files and what each change does
          2. Identify existing test files that cover changed code
          3. List specific test scenarios that should be added or updated
          4. Rate the current test coverage risk as HIGH/MEDIUM/LOW

          PR Diff:
          $(cat pr-diff.txt)
          " > ai-coverage-report.md

      - name: Comment on PR
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const report = fs.readFileSync('ai-coverage-report.md', 'utf8');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## 🤖 AI QA Coverage Analysis\n\n${report}`
            });

What this does: On every PR, Claude analyzes the diff, identifies test coverage gaps, and posts a structured comment. Your team reviews the AI analysis alongside the code review.

Adding to GitLab CI

ai-qa-review:
  stage: test
  image: node:20
  before_script:
    - npm install -g @anthropic-ai/claude-code
  script:
    - git diff origin/${CI_MERGE_REQUEST_TARGET_BRANCH_NAME}...HEAD > pr-diff.txt
    - claude --print "Review this diff for test coverage gaps: $(cat pr-diff.txt)" > ai-report.md
  artifacts:
    paths:
      - ai-report.md
  only:
    - merge_requests

Managing API Keys in CI

Never hardcode API keys. Use your CI platform's secrets mechanism:

Platform	Secret location
GitHub Actions	Settings → Secrets and variables → Actions
GitLab CI	Settings → CI/CD → Variables
Jenkins	Credentials Manager
CircleCI	Project Settings → Environment Variables

Reference in your workflow as ${{ secrets.ANTHROPIC_API_KEY }} (GitHub) or $ANTHROPIC_API_KEY (GitLab/Jenkins).

Learning Tip: Start with a read-only AI step in CI — analysis and commenting only, no auto-commits. Once your team trusts the output quality, you can graduate to AI steps that open draft PRs with suggested test additions. Going straight to auto-commit on day one creates trust problems if the AI produces a bad output.

How to Set Up Prompt Files, Context Files, and Workspace Configuration

The single most impactful configuration step is setting up your project context files. Without these, every AI session starts from zero. With them, every AI session starts with full knowledge of your project's structure, conventions, and testing standards.

The CLAUDE.md File

CLAUDE.md (in the root of your project) is Claude Code's persistent project context file. It's automatically read at the start of every Claude Code session in that directory. Think of it as your project's "briefing document" for the AI.

Minimal QA-focused CLAUDE.md:


## Test Stack
- Framework: Playwright (TypeScript)
- Unit tests: Jest + React Testing Library
- API tests: Supertest
- Test runner: `npm test` for unit, `npx playwright test` for E2E

## Test File Locations
- Unit tests: `src/__tests__/` (mirror of src structure)
- E2E tests: `tests/e2e/` 
- API tests: `tests/api/`
- Page objects: `tests/e2e/pages/`
- Fixtures: `tests/fixtures/`

## Test Conventions
- Test file naming: `[component].test.ts` for unit, `[feature].spec.ts` for E2E
- Selectors: use `data-testid` attributes (never CSS classes or IDs)
- Assertions: use Playwright's built-in `expect` — no custom assertion libraries
- Test data: use factory functions from `tests/fixtures/factories.ts`

## Domain Context
- This is a B2B SaaS project management tool
- Users have roles: OWNER, ADMIN, MEMBER, VIEWER
- Key entities: Workspace, Project, Task, Comment, Attachment
- Payment states: TRIAL, ACTIVE, PAST_DUE, CANCELLED
- Authentication: JWT tokens, 24h expiry, refresh tokens stored in httpOnly cookies

## What NOT to Generate
- Do not mock the database — use test database (see `tests/setup.ts`)
- Do not add `setTimeout` waits — use Playwright's `waitFor*` methods
- Do not generate test data with real email addresses or PII

For Gemini CLI, create GEMINI.md with the same content in the same location.

Slash Commands for Reusable QA Tasks

Claude Code supports custom slash commands — reusable prompt templates you can invoke by name. Store them in .claude/commands/:

mkdir -p .claude/commands

.claude/commands/coverage-gap.md — Coverage gap analysis:

Analyze the test coverage for the module I specify.

Steps:
1. Read the source file(s) for the specified module
2. Read all test files that reference this module
3. List all functions/methods in the source
4. Identify which are covered and which are not
5. For uncovered items, generate specific test case scenarios
6. Output a coverage report with: covered, uncovered, and suggested new tests

Format output as:
## Coverage Report: [module name]
### Currently Covered
[list]
### Not Covered
[list]  
### Suggested Test Cases
[numbered list of specific scenarios]

.claude/commands/pr-review.md — PR test review:

Review the current git diff (git diff origin/main...HEAD) and:

1. Identify all changed source files
2. Find existing tests that cover changed code
3. Identify test coverage gaps — specific scenarios not tested
4. Suggest which gaps are highest risk given the change
5. Generate draft test cases for the top 3 highest-risk gaps

Output format:
## Changed Files
[list with brief description of each change]

## Existing Test Coverage
[map of changed file → test file(s)]

## Coverage Gaps (Risk-Ordered)
[numbered list, highest risk first]

## Draft Test Cases for Top Gaps
[test cases in the project's test framework format]

.claude/commands/bug-report.md — Bug report generation:

I'll provide test failure output or a bug description. Generate a structured bug report.

Required input: paste the failure output or describe the bug below.

Output format:
## Summary
[one sentence]

## Environment
[ask me if not provided]

## Steps to Reproduce
[numbered steps]

## Expected Behavior
[what should happen]

## Actual Behavior  
[what actually happens]

## Evidence
[logs, screenshots, error messages]

## Root Cause Hypothesis
[analysis based on provided evidence]

## Suggested Fix Direction
[for developer context — optional based on evidence]

Using slash commands:

claude
> /coverage-gap PaymentService
> /pr-review
> /bug-report

Context Files for Specific Tasks

For tasks that need additional context beyond CLAUDE.md, create task-specific context files:

qa-context/
├── api-schema.md       # Summarized API endpoints and response schemas
├── domain-model.md     # Key entities, states, and business rules
├── test-conventions.md # Extended test writing standards
└── known-issues.md     # Current known bugs to exclude from reports

Reference these in your prompts:

claude "Review the checkout flow for test coverage. 
Additional context: $(cat qa-context/domain-model.md)"

Learning Tip: Your CLAUDE.md is never finished — it should grow as you discover gaps. Every time you get a bad AI output caused by missing context (wrong assertion format, wrong selector strategy, unknown domain concept), add that information to CLAUDE.md immediately. After two weeks of active use, your CLAUDE.md will be a comprehensive project brief that makes every AI session significantly more accurate.

How to Verify Your AI QA Environment with a First Real Task

Don't consider your setup complete until you've validated it on a real task from your actual project. Validation means: the AI produces output accurate enough to be useful with review, not perfect but not garbage either.

Validation Task 1: Coverage Gap Analysis

Run this against a real module in your project:

cd your-test-project
claude "Using the context in CLAUDE.md, analyze the test coverage for [YourServiceName]. 
Read the source file and its existing tests. 
List what is covered and what critical paths are not tested.
Format the output as a coverage gap report."

What good output looks like: The AI correctly identifies your source file, finds the right test files, lists real method names, and identifies genuine gaps — not generic gaps it would list for any service.

What bad output looks like: Generic suggestions that don't reference actual method names or real acceptance criteria. If you see this, your CLAUDE.md needs more specific context.

Validation Task 2: Test Case Generation from a Real User Story

Take an actual user story from your current sprint and run:

claude "Generate test cases for this user story using our Playwright test conventions.

User Story:
[paste your actual user story here]

Acceptance Criteria:
[paste actual acceptance criteria]

Generate:
- Positive path test cases
- Negative path test cases  
- Edge cases and boundary conditions
Format them as Playwright test blocks following the conventions in CLAUDE.md."

What good output looks like: Test cases that use data-testid selectors (per your CLAUDE.md), use your project's factory functions for test data, and reference the real entities in your domain model.

What bad output looks like: Tests that use CSS class selectors, hardcode test data, or reference a completely generic e-commerce checkout flow instead of your actual application.

Validation Task 3: CI Integration Check

If you've set up the GitHub Actions workflow, create a test PR that modifies one source file without updating its tests. Confirm:

The CI job runs successfully (no authentication errors)
A PR comment is generated with relevant content
The comment correctly identifies that the modified file lacks test updates
The suggested test additions make sense for the actual change

Troubleshooting Common Setup Issues

Issue	Likely cause	Fix
`claude: command not found`	Global npm bin not in PATH	Add `$(npm root -g)/../bin` to your PATH
`Authentication failed`	Invalid or missing API key	Re-run `claude` to re-authenticate or check `ANTHROPIC_API_KEY`
AI output is generic, not project-specific	CLAUDE.md not being read	Confirm CLAUDE.md is in the directory where you run `claude`
AI references wrong test framework	CLAUDE.md missing tech stack info	Add explicit framework version and conventions to CLAUDE.md
CI job failing with quota errors	API rate limits	Add `--max-turns 5` flag to limit loop iterations in CI
AI output has wrong selector strategy	CLAUDE.md lacks selector convention	Add "Use only data-testid selectors" to your CLAUDE.md

Your Environment Is Ready When:

claude --version and gemini --version print valid version numbers
Running a coverage gap analysis produces output that references real file names and real method names from your project
Your CLAUDE.md contains: tech stack, test file locations, test conventions, domain model essentials
At least one slash command (e.g., /coverage-gap) is set up and working
(Optional) CI integration is posting draft comments on PRs

Learning Tip: Run the three validation tasks with two different prompts each — one detailed prompt with full context and one minimal prompt with only the task description. Comparing the outputs shows you exactly how much value your CLAUDE.md context is adding. The delta between "minimal context" and "full context" output quality is the most motivating evidence you can show your team for why context engineering matters.