·

Hands-on: Build a reusable context toolkit

Hands-on: Build a reusable context toolkit

How to Audit Your QA Tasks and Identify Their Context Needs?

A context toolkit is only as useful as its alignment with the real tasks you perform. Before building anything, you need an accurate inventory of what you actually do as a QA engineer and what context each task requires to produce AI-assisted output at high quality. This audit is the foundation of everything that follows.

Step 1: List Your Recurring QA Tasks

Spend 10 minutes writing down every QA task you perform on a recurring basis — not idealized tasks from a job description, but the actual work that fills your days and sprints.

A typical mid/senior QA engineer's list looks something like this:

Per-sprint tasks:
- Review new user stories and acceptance criteria for testability issues
- Write test cases for new features (API, UI, or both)
- Scope regression coverage for PRs and feature branches
- Execute exploratory testing sessions
- Write or update test scripts for automated suites
- Triage CI failures and classify test failures
- Draft or update bug reports for defects found

Less frequent but regular tasks:
- Test plan creation for new features or releases
- Coverage gap analysis for existing test suites
- Risk assessment for major releases
- Updating test cases for changed features
- Test environment setup and validation
- Onboarding documentation for new team members

Ad-hoc tasks:
- Root cause analysis for production incidents
- Test data preparation for specific scenarios
- Automation framework research and evaluation

Step 2: Rate Each Task's AI Potential

For each task on your list, rate it on two dimensions:
- Frequency: How often does this task occur? (Daily / Weekly / Monthly)
- AI benefit: How much time would good AI assistance save? (High / Medium / Low)

Plot these on a simple matrix:

                HIGH AI BENEFIT
                     |
   Write test cases  |  Scope regression
   for new features  |  for PRs
   (weekly)          |  (weekly)
                     |
   ─────────────────────────────────
                     |
   Review stories    |  Research new tools
   for testability   |  (monthly)
   (daily)           |
                     |
                LOW AI BENEFIT

Focus your toolkit-building effort on the quadrant that combines high frequency with high AI benefit. These are the tasks where a well-built toolkit pays back its investment fastest.

Step 3: For Each Priority Task, Identify Context Needs

For each task in your priority quadrant, write down the answers to these questions:

  1. What artifact(s) define correct behavior for this task? (Spec? AC? OpenAPI schema?)
  2. What artifact(s) describe the system state? (Code? Diff? Test file?)
  3. What constraints must AI output respect? (Framework? Format? Scope limits?)
  4. What's the desired output shape? (Table? JSON? Code? Prose?)
  5. What context do I currently gather manually every time I do this task?

The answers become the blueprint for your context template.

Example audit for "Write API test cases for a new endpoint":

Question Answer
Correct behavior artifact OpenAPI spec section + acceptance criteria
System state artifact Handler function code + data model types
Constraints Max 25 scenarios, Jest + Supertest, scenarios only (no code)
Output shape Markdown table: ID / Scenario / Input / Status / Response / AC-ID / Risk
Currently gathered manually AC from Jira, handler from VSCode, types from the same file

This audit entry directly maps to a context template design.

Step 4: Identify Reuse Patterns

Look for patterns across your task list:
- Does the same role frame apply to multiple tasks? (Build one and reuse it)
- Do multiple tasks need the same output format? (Build one format template)
- Do multiple tasks use the same spec artifact? (Build one spec extraction guide)

Identifying these patterns before building templates prevents duplication and makes your toolkit easier to maintain.

Learning Tip: Do this audit collaboratively with 1–2 teammates. Different people have different pain points and different task mixes. A 30-minute shared audit session will surface tasks and context patterns that a solo audit would miss — and produces a toolkit that serves the whole team, not just one engineer.


How to Build Context Templates for Test Planning, Generation, and Bug Analysis?

With your audit complete, you're ready to build templates. A context template is a prompt structure with labeled placeholders — it captures the architecture of a high-quality prompt while leaving the task-specific content for you to fill in.

Template Design Principles

1. Make the placeholders explicit and labeled
Every variable part of the template should be a clearly labeled placeholder: [PASTE: acceptance criteria], [SPECIFY: test framework], [PASTE: diff — git diff main...branch output]. This removes guessing from the preparation step.

2. Include a required context checklist
Before the prompt body, list everything you need to gather. This makes the template self-contained — you don't need to remember what to prepare.

3. Document the constraints explicitly
Don't carry constraints in your head. Every constraint the model needs should be written into the template as a literal instruction.

4. Make outputs immediately usable
The format specification should produce output that can be copied directly into your workflow tool — Jira, TestRail, Notion, your test file — without reformatting.

Template 1: Sprint Test Planning

Use when: You have a set of stories for a sprint and need a prioritized test plan.

## Template: Sprint Test Planning

### Required context to gather before using this template:
- [ ] All sprint user stories with acceptance criteria
- [ ] Test coverage summary from last sprint (or notes on what's already automated)
- [ ] Any known high-risk areas flagged by the team
- [ ] Release deadline (for time-boxing recommendations)

---

**Prompt:**
You are a senior QA engineer creating a risk-based test plan for a sprint. You are
thorough, practical, and you prioritize testing effort based on risk to users and
business outcomes.

SPRINT STORIES:
[PASTE: all user stories with acceptance criteria — use structured AC format if possible]

EXISTING COVERAGE:
[PASTE: brief summary of what is already automated — module name, rough coverage %,
or list of features covered]

KNOWN RISK AREAS:
[PASTE: any risks flagged in planning — or write "none identified" if not applicable]

CREATE A SPRINT TEST PLAN with the following sections:

1. RISK ASSESSMENT: For each story, rate testing risk: High / Medium / Low with justification.

2. TEST SCOPE: For each High and Medium risk story, list the test categories to cover
   (functional, boundary, negative, integration, security). For Low risk, note "smoke test only."

3. AUTOMATION CANDIDATES: Identify which scenarios are strong candidates for automation
   vs. manual-only.

4. EXECUTION ORDER: Recommend testing sequence based on dependencies and risk.

5. TIME ESTIMATE: Rough estimate of QA effort in hours for manual testing.

FORMAT: Structured markdown with H3 for each section. Use tables for the Risk Assessment
and Test Scope sections.

Template 2: API Test Case Generation

Use when: Generating test scenarios for a REST API endpoint.

## Template: API Test Case Generation

### Required context to gather:
- [ ] Acceptance criteria or OpenAPI spec for the endpoint
- [ ] Handler function code (30–80 lines)
- [ ] Data model / type definitions for request and response
- [ ] One example test file from your project (for style reference)

---

**Prompt:**
You are a senior backend QA engineer. You apply boundary value analysis, equivalence
partitioning, and security testing principles to all API test work.

SPEC CONTEXT:
[PASTE: acceptance criteria OR OpenAPI spec YAML for this endpoint]

SYSTEM CONTEXT:
[PASTE: handler function code]
[PASTE: request/response type definitions]

STYLE REFERENCE:
[PASTE: one existing test file — 50–100 lines — as style guide]
Do NOT duplicate the test cases in this example. Use it only for style guidance.

GENERATE:
A complete test scenario table. Cover:
- 2+ happy path scenarios
- All documented error responses
- Boundary conditions for numeric and string inputs
- Authentication/authorization scenarios
- At least 1 idempotency or concurrency scenario if applicable

CONSTRAINTS:
- Maximum 25 scenarios
- Map each to an AC-ID from the spec (or "inferred" if not in spec)
- Test scenarios only — no test code
- Framework: [SPECIFY: Jest / Pytest / REST-assured / other]

FORMAT:
Markdown table: ID | Scenario | Input | Expected Status | Expected Response | AC-ID | Risk

Template 3: Bug Root Cause Analysis

Use when: Analyzing a test failure or production incident.

## Template: Bug Root Cause Analysis

### Required context to gather:
- [ ] Exact failure output / stack trace (not a summary — exact text)
- [ ] The test code that produced the failure
- [ ] The source function/method implicated in the failure (if known)
- [ ] System expected behavior from spec (what should have happened)
- [ ] Any relevant log output around the failure time

---

**Prompt:**
You are a senior QA engineer and debugging specialist. You distinguish between confirmed
facts, likely hypotheses, and speculative possibilities. Your analysis is always
evidence-grounded.

EXPECTED BEHAVIOR:
[PASTE: spec statement or description of what should have happened]

FAILURE EVIDENCE:
[PASTE: exact test failure output — assertion error, exception message, exit code]

STACK TRACE:
[PASTE: full stack trace if available]

TEST CODE:
[PASTE: the test that failed — specifically the assertion and setup code]

SOURCE CODE (if available):
[PASTE: the function/method under test]

APPLICATION LOGS (if available):
[PASTE: relevant log lines around failure time — 20–50 lines]

ANALYZE with this structure:

1. ROOT CAUSE (what the evidence directly confirms)
2. CONTRIBUTING CONDITIONS (what made this root cause trigger)
3. CONFIDENCE LEVEL: High / Medium / Low (how certain are you, and why)
4. INFORMATION GAPS: what additional evidence would increase confidence?
5. RECOMMENDED FIX: minimum change to resolve the root cause
6. REGRESSION PREVENTION: what test(s) would catch this if it recurs?

Template 4: Regression Scope Analysis

Use when: Deciding which tests to run for a PR or feature branch.

## Template: Regression Scope Analysis

### Required context to gather:
- [ ] PR diff (git diff main...branch — or the key changed functions listed)
- [ ] Test suite summary (file names, what each file tests — or relevant subset)
- [ ] Any spec context for the changed areas (AC, behavioral contracts)

---

**Prompt:**
You are a senior QA engineer conducting a risk-based regression analysis for a PR.

PR CHANGE SUMMARY:
[PASTE: git diff --stat output + key changed functions or a brief written summary]

SPEC CONTEXT (for changed areas):
[PASTE: relevant AC or spec for the feature that changed]

TEST SUITE SUMMARY:
[PASTE: list of test files with one-line descriptions of what each tests]
OR
[PASTE: 2–3 most relevant test files in full if suite is small]

PERFORM THIS ANALYSIS:

STEP 1: List every function/method/endpoint that was modified (not added, not deleted — modified).

STEP 2: For each modified element, identify its callers and dependents from the test suite.

STEP 3: Risk-score each modified element: High / Medium / Low — with one sentence justification.

STEP 4: Recommend the minimum regression test set: the tests that MUST run to validate
this PR with high confidence. By test file name or test description.

STEP 5: Identify any changed behavior with NO test coverage (these are gaps to fill).

FORMAT: Four clearly labeled sections with STEP headers.

Learning Tip: Build these four templates this week — you now have the complete set for the four highest-frequency, highest-impact QA tasks. Create them as markdown files in a prompt-library/ directory in your project repository or personal notes. Even if you only use one template once per sprint, you'll recover the time you spent building it within two sprints.


How to Test and Validate Your Context Templates Against Real Tasks?

A template you haven't validated against a real task is a hypothesis, not a tool. Validation is the step that converts a reasonable-seeming template into a reliable, battle-tested asset.

The Validation Protocol

Phase 1 — Run the template on a known task

Find a QA task you've completed recently and for which you have the output (a test plan, a set of test cases, a bug analysis). Run your new template on the same task.

Compare the AI output to your manual output:
- Does the AI output cover the same scenarios?
- Does it catch anything you missed?
- Is anything missing that you included manually?
- Is the format immediately usable, or does it need editing?

Phase 2 — Run on a new real task

Apply the template to a new, current task — something from this sprint. Evaluate the output against the evaluation checklist from Topic 6 (specificity, completeness, accuracy, scope, usability).

Rate the output: 1 (unusable) → 5 (copy-paste ready, no changes needed).

A template that consistently produces 3–4 ratings is a working template that needs refinement. A template that consistently produces 4–5 ratings is a production-ready asset.

Phase 3 — Identify the failure mode

For every low-rated output, identify the failure mode (vague / incomplete / off-target) and trace it to the template element that caused it. Fix the template.

Common template validation failures and fixes:

Failure Likely cause Fix
Generic output not specific to your system Role frame missing tech stack specifics Add framework names, data model names to role frame
Wrong output format Format specification ambiguous Add a concrete template example with column names
Scope too broad or too narrow Constraint section missing or too vague Add explicit maximum scenario count, explicit exclusion list
Hallucinated field names or status codes System context placeholder too vague Add specific instruction: "paste the handler function, not just a description"
Output omits error paths Generation instruction incomplete Add explicit requirement: "cover all documented error responses"

Phase 4 — Run with multiple team members

Have a colleague use the template independently on a similar task. If their output quality is similar to yours, the template is portable. If their output is much worse, the template has hidden assumptions that only work because you know the context implicitly.

Ask your colleague: "What was unclear? What did you have to guess about?" Those are the places where placeholders need better documentation.

Iteration Cadence for Templates

  • After validation run: fix any Phase 3 failures immediately
  • After first 5 real uses: review for patterns in what you're always editing manually
  • After framework or process changes: update the tech stack specifics
  • Every quarter: review all templates and retire any that no longer match your workflow

Keeping a Template Quality Log

For each template, track:

Template: API Test Case Generation
Validation date: [date]
Tasks validated against: 3 (user auth, order creation, product search)
Average output rating: 4.2/5
Known limitations: - Doesn't work well for streaming/WebSocket endpoints
                   - Rate limiting scenarios need explicit AC or model ignores them
Recent updates: Added explicit constraint for rate limiting scenarios (2024-01-20)

This log prevents regressions when you update a template (you can see if the update improved or worsened the average rating) and documents the template's known limitations so you're not surprised by them.

Learning Tip: Do a validation run with real tasks before sharing any template with your team. A template that works great for you but was never tested by someone else is likely to frustrate your teammates when they find the hidden assumptions. One 30-minute validation session — run the template, rate the output, fix two issues — is the minimum quality gate before a template goes to the team library.


How to Organize and Share Your Context Toolkit with Your QA Team?

A personal prompt library multiplies your individual productivity. A shared team toolkit multiplies the entire team's productivity. The organizational step transforms a personal asset into a team infrastructure.

Choosing the Right Home for Your Toolkit

Option Best for Trade-offs
Git repository (/docs/prompt-library/) Teams with a strong engineering culture, want version control Requires pull requests to update
Notion / Confluence page Teams that already use these for documentation Easy to edit, no version control, can get messy
Team chat channel (Slack/Teams pin) Quick sharing of individual prompts No organization, no versioning, gets buried
Shared Google Drive folder Non-technical QA teams Easy access, no version control
QA team wiki Teams with existing documentation systems Consistent with other documentation

The best choice is wherever your team already goes for QA reference material. A brilliant prompt library that no one finds is wasted effort.

Recommended approach for engineering teams: Keep templates as markdown files in a prompt-library/ directory in your test repository:

tests/
  prompt-library/
    README.md                         # Index of all templates + usage guide
    test-generation/
      api-test-generation.md
      ui-e2e-generation.md
      unit-test-generation.md
    analysis/
      bug-root-cause-analysis.md
      ci-failure-classification.md
      regression-scope-analysis.md
    planning/
      sprint-test-planning.md
      exploratory-charter-generation.md
    role-frames/
      backend-api-qa.md
      frontend-ui-qa.md
    _ai-context/
      example-api-test.ts             # Style reference files
      example-e2e-test.spec.ts

This structure makes templates accessible from the same place as your test code, version-controlled alongside the tests they generate, and discoverable through your code editor.

Creating a Toolkit README

The README is what makes the toolkit usable by teammates who didn't build it. Include:


## What Is This?
A collection of reusable AI prompt templates for common QA tasks.

## How to Use a Template
1. Open the relevant template file
2. Gather the required context listed at the top of the template
3. Copy the prompt, replace all [PASTE: ...] and [SPECIFY: ...] placeholders
4. Run in Claude Code, Claude.ai, or your preferred AI tool
5. Evaluate output using the quality checklist (see Quality Standards section)

## Template Index

| Template | Use Case | Location |
|---|---|---|
| Sprint Test Planning | Create a risk-based test plan for a sprint | planning/sprint-test-planning.md |
| API Test Case Generation | Generate scenarios for REST endpoints | test-generation/api-test-generation.md |
| Bug Root Cause Analysis | Analyze test failures and CI issues | analysis/bug-root-cause-analysis.md |
| Regression Scope Analysis | Decide what tests to run for a PR | analysis/regression-scope-analysis.md |

## Contributing a New Template
- Use the template-format.md as your template structure
- Run validation against at least 2 real tasks before submitting
- Include known limitations in the template metadata
- Submit as a PR for team review

## Quality Standards
[Link to the output evaluation checklist from Topic 6]

Onboarding Teammates to the Toolkit

A toolkit your teammates don't know how to use is unused. Onboarding takes 30–45 minutes and pays back immediately:

Session structure:
1. Context (5 min): Why we have a prompt library, what it does and doesn't do
2. Live demo (15 min): Run one template on a real current task, from context gathering through output evaluation and refinement
3. Hands-on (15 min): Each attendee runs the same template on their own task
4. Q&A and feedback (10 min): What was confusing, what needs updating

Follow up with a recorded version of the live demo for async reference. New team members should go through the recording as part of QA onboarding.

Keeping the Toolkit Alive

A toolkit that isn't maintained becomes stale and trust erodes. Maintenance commitment:

  • Template owner: Each template has a named owner responsible for keeping it current
  • Update triggers: Framework upgrades, process changes, team conventions changes
  • Quarterly review: All templates reviewed for accuracy and effectiveness
  • Deprecation process: Mark outdated templates as [DEPRECATED] rather than deleting them (someone might still need the old version)

Measuring Toolkit Impact

Track these metrics to demonstrate toolkit value and identify areas for improvement:

  • Adoption rate: What % of QA team members use at least one template per sprint?
  • Time savings: Before/after comparison on tasks where templates were adopted
  • Output quality: Average output rating before and after toolkit adoption
  • Template usage frequency: Which templates are used most? (These should be maintained first)
  • Contribution rate: How many team members have contributed or updated a template?

Share these metrics with your team lead and QA manager. The toolkit is not just a productivity tool — it's evidence of QA engineering maturity and systematic process improvement.

The Compound Value of a Shared Toolkit

When a new QA engineer joins the team, they inherit months of prompt engineering work immediately. When a template is improved, the improvement benefits every team member who uses it. When the team discovers a new QA task that benefits from AI assistance, there's an existing process to build and share the template.

The toolkit is the infrastructure that makes AI-assisted QA a consistent team practice rather than an inconsistent individual skill.

Individual using templates → 2–3× personal productivity on covered tasks
5-person team with shared toolkit → 2–3× team productivity on covered tasks + cross-team consistency
10-person team → same multiplier, larger impact, faster onboarding, institutional memory

The investment in building and sharing a context toolkit is one of the highest-return activities available to a mid/senior QA engineer today.

Learning Tip: The hardest part of building a team toolkit is the first commit. Don't wait until you have a "complete" library — start with your best two templates in a shared location today. Announce it to your team as "a starting point." Imperfect and shared beats perfect and private. Once teammates start using it and see the value, contributions follow naturally. Your role as the person who started it is to maintain quality standards and keep the README updated — not to build everything yourself.