How to Install and Configure Claude Code and Gemini for QA Workflows
Before you can use AI agents for real QA work, you need both tools installed, authenticated, and validated. This section walks you through getting Claude Code and Gemini CLI production-ready — not just installed, but configured for QA-specific use.
Prerequisites
Confirm these are installed before proceeding:
node --version # Requires Node.js 18+
npm --version # Comes with Node.js
git --version # Needed for repo-aware agent work
If node is below 18, update via nodejs.org or use a version manager:
nvm install 20
nvm use 20
Installing Claude Code
npm install -g @anthropic-ai/claude-code
claude --version
Authentication: Claude Code requires an Anthropic API key. You can authenticate in two ways:
claude
export ANTHROPIC_API_KEY="sk-ant-..."
On first run, claude will open a browser window for API key setup if no key is present. For team environments, use a shared service account API key stored in your secrets manager (AWS Secrets Manager, GitHub Secrets, 1Password).
Verify Claude Code is working:
cd your-test-repository
claude --print "List the test files in this project and describe what each covers in one sentence"
Expected output: A concise summary of your test files. If you see this, the agent is reading your codebase correctly.
Installing Gemini CLI
npm install -g @google/gemini-cli
gemini --version
Authentication: Gemini CLI authenticates via Google Cloud credentials:
gemini auth login
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
export GEMINI_API_KEY="..."
Verify Gemini is working:
cd your-test-repository
gemini --prompt "List the test files in this project and describe their purpose"
Configuring Model Defaults
Both tools let you specify the model. For QA workflows, prefer the most capable models:
claude --model claude-opus-4-6 "your task"
echo 'export CLAUDE_MODEL="claude-opus-4-6"' >> ~/.zshrc
gemini --model gemini-2.0-pro "your task"
For CI/CD workflows where cost matters, use faster/cheaper models (Claude Haiku, Gemini Flash) for lightweight tasks like selector validation or basic test generation.
Learning Tip: Install both tools even if you plan to primarily use one. Different models have different strengths on different tasks — you'll quickly find that Claude Code and Gemini produce different quality output on the same prompt, and having both available lets you cross-validate output on important tasks.
How to Connect AI Tools to Your Test Repositories and CI/CD Pipeline
Having agents installed locally is step one. Making them part of your team's workflow requires connecting them to your repositories and CI/CD infrastructure.
Connecting to Your Test Repository
The most important step is giving the agent proper access to your codebase. Both Claude Code and Gemini CLI work by reading files from your current directory. Always run them from the root of your project or test repository:
cd /path/to/your/project
claude "Review the test coverage for the PaymentService module"
Repository structure awareness: Agents are significantly more effective when your test files are clearly named and organized. If your project has a non-standard structure, describe it in your context file (covered in the next section).
Adding to GitHub Actions
Create a GitHub Actions workflow that runs Claude Code on pull requests:
name: AI QA Coverage Review
on:
pull_request:
types: [opened, synchronize]
jobs:
ai-coverage-analysis:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for diff analysis
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install Claude Code
run: npm install -g @anthropic-ai/claude-code
- name: Get PR Diff
id: diff
run: |
git diff origin/${{ github.base_ref }}...HEAD > pr-diff.txt
echo "diff_file=pr-diff.txt" >> $GITHUB_OUTPUT
- name: Run AI Coverage Analysis
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
claude --print "
You are a QA coverage analyst.
Review the following PR diff and:
1. List the changed code files and what each change does
2. Identify existing test files that cover changed code
3. List specific test scenarios that should be added or updated
4. Rate the current test coverage risk as HIGH/MEDIUM/LOW
PR Diff:
$(cat pr-diff.txt)
" > ai-coverage-report.md
- name: Comment on PR
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const report = fs.readFileSync('ai-coverage-report.md', 'utf8');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## 🤖 AI QA Coverage Analysis\n\n${report}`
});
What this does: On every PR, Claude analyzes the diff, identifies test coverage gaps, and posts a structured comment. Your team reviews the AI analysis alongside the code review.
Adding to GitLab CI
ai-qa-review:
stage: test
image: node:20
before_script:
- npm install -g @anthropic-ai/claude-code
script:
- git diff origin/${CI_MERGE_REQUEST_TARGET_BRANCH_NAME}...HEAD > pr-diff.txt
- claude --print "Review this diff for test coverage gaps: $(cat pr-diff.txt)" > ai-report.md
artifacts:
paths:
- ai-report.md
only:
- merge_requests
Managing API Keys in CI
Never hardcode API keys. Use your CI platform's secrets mechanism:
| Platform | Secret location |
|---|---|
| GitHub Actions | Settings → Secrets and variables → Actions |
| GitLab CI | Settings → CI/CD → Variables |
| Jenkins | Credentials Manager |
| CircleCI | Project Settings → Environment Variables |
Reference in your workflow as ${{ secrets.ANTHROPIC_API_KEY }} (GitHub) or $ANTHROPIC_API_KEY (GitLab/Jenkins).
Learning Tip: Start with a read-only AI step in CI — analysis and commenting only, no auto-commits. Once your team trusts the output quality, you can graduate to AI steps that open draft PRs with suggested test additions. Going straight to auto-commit on day one creates trust problems if the AI produces a bad output.
How to Set Up Prompt Files, Context Files, and Workspace Configuration
The single most impactful configuration step is setting up your project context files. Without these, every AI session starts from zero. With them, every AI session starts with full knowledge of your project's structure, conventions, and testing standards.
The CLAUDE.md File
CLAUDE.md (in the root of your project) is Claude Code's persistent project context file. It's automatically read at the start of every Claude Code session in that directory. Think of it as your project's "briefing document" for the AI.
Minimal QA-focused CLAUDE.md:
## Test Stack
- Framework: Playwright (TypeScript)
- Unit tests: Jest + React Testing Library
- API tests: Supertest
- Test runner: `npm test` for unit, `npx playwright test` for E2E
## Test File Locations
- Unit tests: `src/__tests__/` (mirror of src structure)
- E2E tests: `tests/e2e/`
- API tests: `tests/api/`
- Page objects: `tests/e2e/pages/`
- Fixtures: `tests/fixtures/`
## Test Conventions
- Test file naming: `[component].test.ts` for unit, `[feature].spec.ts` for E2E
- Selectors: use `data-testid` attributes (never CSS classes or IDs)
- Assertions: use Playwright's built-in `expect` — no custom assertion libraries
- Test data: use factory functions from `tests/fixtures/factories.ts`
## Domain Context
- This is a B2B SaaS project management tool
- Users have roles: OWNER, ADMIN, MEMBER, VIEWER
- Key entities: Workspace, Project, Task, Comment, Attachment
- Payment states: TRIAL, ACTIVE, PAST_DUE, CANCELLED
- Authentication: JWT tokens, 24h expiry, refresh tokens stored in httpOnly cookies
## What NOT to Generate
- Do not mock the database — use test database (see `tests/setup.ts`)
- Do not add `setTimeout` waits — use Playwright's `waitFor*` methods
- Do not generate test data with real email addresses or PII
For Gemini CLI, create GEMINI.md with the same content in the same location.
Slash Commands for Reusable QA Tasks
Claude Code supports custom slash commands — reusable prompt templates you can invoke by name. Store them in .claude/commands/:
mkdir -p .claude/commands
.claude/commands/coverage-gap.md — Coverage gap analysis:
Analyze the test coverage for the module I specify.
Steps:
1. Read the source file(s) for the specified module
2. Read all test files that reference this module
3. List all functions/methods in the source
4. Identify which are covered and which are not
5. For uncovered items, generate specific test case scenarios
6. Output a coverage report with: covered, uncovered, and suggested new tests
Format output as:
## Coverage Report: [module name]
### Currently Covered
[list]
### Not Covered
[list]
### Suggested Test Cases
[numbered list of specific scenarios]
.claude/commands/pr-review.md — PR test review:
Review the current git diff (git diff origin/main...HEAD) and:
1. Identify all changed source files
2. Find existing tests that cover changed code
3. Identify test coverage gaps — specific scenarios not tested
4. Suggest which gaps are highest risk given the change
5. Generate draft test cases for the top 3 highest-risk gaps
Output format:
## Changed Files
[list with brief description of each change]
## Existing Test Coverage
[map of changed file → test file(s)]
## Coverage Gaps (Risk-Ordered)
[numbered list, highest risk first]
## Draft Test Cases for Top Gaps
[test cases in the project's test framework format]
.claude/commands/bug-report.md — Bug report generation:
I'll provide test failure output or a bug description. Generate a structured bug report.
Required input: paste the failure output or describe the bug below.
Output format:
## Summary
[one sentence]
## Environment
[ask me if not provided]
## Steps to Reproduce
[numbered steps]
## Expected Behavior
[what should happen]
## Actual Behavior
[what actually happens]
## Evidence
[logs, screenshots, error messages]
## Root Cause Hypothesis
[analysis based on provided evidence]
## Suggested Fix Direction
[for developer context — optional based on evidence]
Using slash commands:
claude
> /coverage-gap PaymentService
> /pr-review
> /bug-report
Context Files for Specific Tasks
For tasks that need additional context beyond CLAUDE.md, create task-specific context files:
qa-context/
├── api-schema.md # Summarized API endpoints and response schemas
├── domain-model.md # Key entities, states, and business rules
├── test-conventions.md # Extended test writing standards
└── known-issues.md # Current known bugs to exclude from reports
Reference these in your prompts:
claude "Review the checkout flow for test coverage.
Additional context: $(cat qa-context/domain-model.md)"
Learning Tip: Your CLAUDE.md is never finished — it should grow as you discover gaps. Every time you get a bad AI output caused by missing context (wrong assertion format, wrong selector strategy, unknown domain concept), add that information to CLAUDE.md immediately. After two weeks of active use, your CLAUDE.md will be a comprehensive project brief that makes every AI session significantly more accurate.
How to Verify Your AI QA Environment with a First Real Task
Don't consider your setup complete until you've validated it on a real task from your actual project. Validation means: the AI produces output accurate enough to be useful with review, not perfect but not garbage either.
Validation Task 1: Coverage Gap Analysis
Run this against a real module in your project:
cd your-test-project
claude "Using the context in CLAUDE.md, analyze the test coverage for [YourServiceName].
Read the source file and its existing tests.
List what is covered and what critical paths are not tested.
Format the output as a coverage gap report."
What good output looks like: The AI correctly identifies your source file, finds the right test files, lists real method names, and identifies genuine gaps — not generic gaps it would list for any service.
What bad output looks like: Generic suggestions that don't reference actual method names or real acceptance criteria. If you see this, your CLAUDE.md needs more specific context.
Validation Task 2: Test Case Generation from a Real User Story
Take an actual user story from your current sprint and run:
claude "Generate test cases for this user story using our Playwright test conventions.
User Story:
[paste your actual user story here]
Acceptance Criteria:
[paste actual acceptance criteria]
Generate:
- Positive path test cases
- Negative path test cases
- Edge cases and boundary conditions
Format them as Playwright test blocks following the conventions in CLAUDE.md."
What good output looks like: Test cases that use data-testid selectors (per your CLAUDE.md), use your project's factory functions for test data, and reference the real entities in your domain model.
What bad output looks like: Tests that use CSS class selectors, hardcode test data, or reference a completely generic e-commerce checkout flow instead of your actual application.
Validation Task 3: CI Integration Check
If you've set up the GitHub Actions workflow, create a test PR that modifies one source file without updating its tests. Confirm:
- The CI job runs successfully (no authentication errors)
- A PR comment is generated with relevant content
- The comment correctly identifies that the modified file lacks test updates
- The suggested test additions make sense for the actual change
Troubleshooting Common Setup Issues
| Issue | Likely cause | Fix |
|---|---|---|
claude: command not found |
Global npm bin not in PATH | Add $(npm root -g)/../bin to your PATH |
Authentication failed |
Invalid or missing API key | Re-run claude to re-authenticate or check ANTHROPIC_API_KEY |
| AI output is generic, not project-specific | CLAUDE.md not being read | Confirm CLAUDE.md is in the directory where you run claude |
| AI references wrong test framework | CLAUDE.md missing tech stack info | Add explicit framework version and conventions to CLAUDE.md |
| CI job failing with quota errors | API rate limits | Add --max-turns 5 flag to limit loop iterations in CI |
| AI output has wrong selector strategy | CLAUDE.md lacks selector convention | Add "Use only data-testid selectors" to your CLAUDE.md |
Your Environment Is Ready When:
claude --versionandgemini --versionprint valid version numbers- Running a coverage gap analysis produces output that references real file names and real method names from your project
- Your CLAUDE.md contains: tech stack, test file locations, test conventions, domain model essentials
- At least one slash command (e.g.,
/coverage-gap) is set up and working - (Optional) CI integration is posting draft comments on PRs
Learning Tip: Run the three validation tasks with two different prompts each — one detailed prompt with full context and one minimal prompt with only the task description. Comparing the outputs shows you exactly how much value your CLAUDE.md context is adding. The delta between "minimal context" and "full context" output quality is the most motivating evidence you can show your team for why context engineering matters.