·

Hands-on: AI-assisted performance and security audit

Hands-on: AI-assisted performance and security audit

How to Scope a Performance and Security Audit Using Architecture Docs and Code Changes?

A performance and security audit without a clear scope is not an audit — it is a random collection of tests that may or may not cover the areas that matter. Scoping is the most important step in any audit engagement, and it is where AI provides enormous leverage because scoping requires synthesizing information from multiple sources: architecture docs, recent code changes, traffic data, compliance requirements, and organizational risk priorities.

This hands-on topic walks through a complete audit workflow from scope to findings report, using a realistic scenario: a QA team auditing an e-commerce checkout service before a Black Friday release.

The audit scenario

You are the QA lead for an e-commerce platform. Your team has two weeks before the Black Friday release. The engineering team has made significant changes in the past three weeks:
- Migrated the order service from a monolith to a microservices architecture
- Added a new real-time inventory reservation system
- Upgraded the authentication service to support social login (OAuth 2.0)
- Implemented a new recommendation engine using ML model inference
- Added a customer data export feature (GDPR right of portability)

Your audit must cover performance risk (will it hold up at 3× normal load?) and security risk (does the new surface area introduce vulnerabilities?). You have two engineers, two weeks, and staging environment access.

Step 1: Feed architecture documents to AI for scope extraction

The first audit session is pure input synthesis. You feed AI all available documentation and ask it to produce a structured audit scope:

I am scoping a performance and security audit for a Black Friday release. 
The following changes have been made in the past 3 weeks. 

Please analyze these changes and produce a structured audit scope document.

## Architecture Changes (summaries from design docs)

### Microservices Migration
- order-service extracted from monolith to standalone Node.js service
- Communicates with inventory-service via gRPC (internal)
- Communicates with payment-service via REST (internal)
- New service-to-service auth using internal JWT with 5-minute TTL
- New failure modes: network latency between services, partial failure scenarios

### Real-Time Inventory Reservation
- New inventory-service: manages stock counts in PostgreSQL
- Reservation approach: SELECT FOR UPDATE with row-level locking
- Race condition mitigations: per-product locks, 30-second reservation TTL
- Known concern from architect: high concurrency on popular products

### OAuth 2.0 Social Login
- Added Google and Facebook OAuth 2.0 flows
- OAuth tokens stored in DB, mapped to internal user accounts
- New attack surface: OAuth state parameter, redirect URI validation, 
  token exchange

### ML Recommendation Engine  
- New recommendation-service: Python Flask, calls external ML API
- Results cached in Redis (5-minute TTL)
- Added GET /api/recommendations/{user_id} endpoint
- Concern: recommendation service adds ~200ms to product page load

### GDPR Data Export
- New endpoint: POST /api/users/{id}/data-export/request
- Async job: exports all user data to S3 pre-signed URL, emails link
- Data export includes: orders, addresses, browsing history, preferences
- Concern: export must not include other users' data

## Traffic Baseline
- Normal peak: 200 concurrent users
- Black Friday target: 600 concurrent users (3× normal)
- Highest-traffic endpoints: product listing, product detail, cart, checkout

Produce a structured audit scope covering:

1. Performance audit scope:
   - Which components and endpoints are in scope for load testing
   - What load levels to test (normal, expected peak, break-point)
   - What failure modes introduced by the migration need specific testing
   - What infrastructure metrics to monitor during tests

2. Security audit scope:
   - Which new attack surfaces require security test cases
   - What OWASP categories apply to each new component
   - Priority order for security testing given the 2-week window

3. Out-of-scope items with rationale

4. Risks not covered by this audit and recommended mitigations

Format as a structured audit scope document I can share with the team 
and use as the basis for our test plan.

Step 2: Prioritizing the scope given team constraints

After AI produces the initial scope, refine it against your constraints:

The audit scope produced above is comprehensive but exceeds our 
2-week capacity with 2 engineers. 

Given these constraints:
- QA Engineer 1 (performance specialist): 8 days of testing capacity
- QA Engineer 2 (security generalist): 8 days of testing capacity
- Staging environment available for load testing: 5 days total 
  (shared booking)

Produce a prioritized execution plan that:
1. Ranks audit items by (risk severity × likelihood) / testing effort
2. Allocates items to Engineer 1 vs. Engineer 2 based on specialization
3. Identifies the 3 items that are non-negotiable before release 
   (blocking risks)
4. Identifies items to defer to post-release with a monitoring mitigation
5. Produces a day-by-day schedule for the 10-day audit window

Highlight: 
- The inventory reservation system under concurrency should be 
  classified as blocking — it is a direct path to oversell on 
  Black Friday. Force this to the top of the priority list.
- The OAuth state parameter validation is a HIGH security risk 
  (CSRF in OAuth flow) — flag as blocking for the security stream.

Learning Tip: Audit scoping sessions work best when you give AI a constraint explicitly: "this scope must fit within X person-days of testing capacity." Without that constraint, AI produces comprehensive but unrealistic scopes. With it, AI produces a prioritized plan that accounts for actual team capacity — and crucially, it tells you what you're not covering and why, which you can document as accepted risk. That "accepted risk" documentation protects the QA team if something slips through that was explicitly deferred.


How to Generate Load and Security Test Cases in One AI Session?

After scoping, the next phase is generation: producing the actual test cases and scripts that will be executed. Running this in a single AI session — maintaining context across performance and security generation — is more efficient than running separate sessions, and it produces better output because AI can identify where performance and security tests overlap (e.g., the same endpoint needs both a load test and an IDOR security test).

Session setup: providing shared context

Begin the generation session with a context document that applies to all tests:

I am generating test cases for a Black Friday performance and security 
audit. Here is the shared context for all tests in this session.

## Application Under Test
- Staging URL: https://api.staging.shop.example.com
- Authentication: JWT Bearer tokens
- Test accounts available:
  - [email protected] / TestAdmin2024! (Admin role)
  - [email protected] / TestUser2024! (Customer, has existing orders)
  - [email protected] / TestUser2024! (Customer, different tenant — 
    for cross-account tests)
  - [email protected] / TestRead2024! (Viewer role)

## Technology Stack
- order-service: Node.js 20, TypeORM, PostgreSQL 15
- inventory-service: Go 1.21, direct SQL, PostgreSQL 15
- auth-service: Node.js, Redis for session state
- recommendation-service: Python Flask, Redis cache

## Test Data
- Product IDs in staging: 1001–1050 (all in stock, quantity: 100)
- Product ID 1001: "Popular Item" — will be used for concurrency tests
- Order IDs for user_a: ORD-5001, ORD-5002, ORD-5003
- Order IDs for user_b: ORD-6001, ORD-6002

## Non-Destructive Testing Agreement
- All tests run against staging only
- No tests that permanently delete data (use soft-delete test accounts)
- Load tests capped at 600 VU max to protect shared staging
- Security tests: no DoS attempts, no payloads that write data 
  to production databases

I will now ask for test cases across performance and security domains. 
Maintain this context for all subsequent requests in this session.

Generating performance test cases for the inventory reservation system

Using the shared context above, generate a k6 load test for the 
inventory reservation system that specifically tests concurrent 
access on a single product.

The test scenario:
- 200 virtual users simultaneously attempt to reserve product 1001
- Product 1001 has quantity: 100 in stock at test start
- Expected behavior: exactly 100 reservations succeed, rest receive 
  409 Conflict or equivalent "out of stock" response
- This tests for race conditions and oversell under high concurrency

Test requirements:
1. All 200 VUs should fire their reservation request simultaneously 
   (use k6 start conditions to synchronize)
2. Track: 
   - Count of successful reservations (201 or 200 status)
   - Count of "out of stock" responses (409)
   - Count of unexpected errors (5xx)
   - Count of no responses (timeout)
3. After the test: the sum of successful reservations should equal exactly 
   the pre-test inventory quantity (no more, no less)
4. Assert this invariant explicitly in the test
5. Include a pre-test step to reset product 1001 quantity to 100 via admin API
6. Include a post-test step to verify final inventory count via admin API

The test should fail if: successful reservations > 100 (oversell detected)

Example generated k6 script for inventory concurrency:

import http from 'k6/http';
import { check, sleep } from 'k6';
import { Counter } from 'k6/metrics';

const BASE_URL = __ENV.BASE_URL || 'https://api.staging.shop.example.com';
const ADMIN_TOKEN = __ENV.ADMIN_TOKEN;
const PRODUCT_ID = '1001';
const INITIAL_QUANTITY = 100;

const successfulReservations = new Counter('successful_reservations');
const outOfStockResponses = new Counter('out_of_stock_responses');
const unexpectedErrors = new Counter('unexpected_errors');

export function setup() {
  // Reset inventory to known state before test
  const resetRes = http.put(
    `${BASE_URL}/api/admin/products/${PRODUCT_ID}/inventory`,
    JSON.stringify({ quantity: INITIAL_QUANTITY }),
    {
      headers: {
        Authorization: `Bearer ${ADMIN_TOKEN}`,
        'Content-Type': 'application/json',
      },
    }
  );

  check(resetRes, { 'inventory reset successful': (r) => r.status === 200 });
  console.log(`Pre-test inventory reset to ${INITIAL_QUANTITY}`);
  return { adminToken: ADMIN_TOKEN };
}

export const options = {
  // All 200 VUs arrive at exactly the same time — no ramp
  vus: 200,
  duration: '30s',
  iterations: 200, // Each VU runs exactly once
  thresholds: {
    // Test FAILS if we see more successful reservations than inventory
    successful_reservations: [`count<=${INITIAL_QUANTITY}`],
    unexpected_errors: ['count<10'],
  },
};

export default function (data) {
  // Authenticate
  const loginRes = http.post(
    `${BASE_URL}/auth/login`,
    JSON.stringify({ 
      email: `loadtest_user_${__VU}@test.example.com`, 
      password: 'TestLoad2024!' 
    }),
    { headers: { 'Content-Type': 'application/json' } }
  );

  const token = loginRes.json('access_token');
  if (!token) return;

  // Attempt reservation — all VUs hit this simultaneously
  const reserveRes = http.post(
    `${BASE_URL}/api/inventory/${PRODUCT_ID}/reserve`,
    JSON.stringify({ quantity: 1 }),
    {
      headers: {
        Authorization: `Bearer ${token}`,
        'Content-Type': 'application/json',
      },
    }
  );

  if (reserveRes.status === 200 || reserveRes.status === 201) {
    successfulReservations.add(1);
  } else if (reserveRes.status === 409) {
    outOfStockResponses.add(1);
  } else {
    unexpectedErrors.add(1);
    console.log(`Unexpected status ${reserveRes.status}: ${reserveRes.body}`);
  }
}

export function teardown(data) {
  // Verify final inventory count
  const inventoryRes = http.get(
    `${BASE_URL}/api/admin/products/${PRODUCT_ID}/inventory`,
    { headers: { Authorization: `Bearer ${data.adminToken}` } }
  );

  const finalQuantity = inventoryRes.json('quantity');
  const finalReserved = inventoryRes.json('reserved');

  console.log(`Final inventory - Available: ${finalQuantity}, Reserved: ${finalReserved}`);

  // Critical check: reserved + available should equal initial
  if (finalReserved > INITIAL_QUANTITY) {
    console.error(`OVERSELL DETECTED: ${finalReserved} reservations for ${INITIAL_QUANTITY} items`);
  }
}

Generating security test cases for OAuth flow in the same session

Still using the shared context, now generate security test cases for 
the new OAuth 2.0 social login implementation.

Focus on: OAuth state parameter CSRF protection and redirect URI validation.

For the OAuth CSRF test (state parameter):
1. Describe the attack: what happens if state is absent or predictable?
2. Generate a test sequence that verifies state validation:
   - Initiate OAuth flow, capture the state parameter in the redirect
   - Attempt to complete OAuth callback with a modified/absent state
   - Expected: 400 or 403 response indicating invalid state
3. Generate a test that checks state entropy:
   - Initiate 5 separate OAuth flows
   - Collect all 5 state parameter values
   - Assert they are unique and have sufficient entropy (>= 32 chars of random)

For redirect URI validation:
1. Attempt OAuth callback with a redirect URI not in the whitelist:
   - Test with: https://evil.attacker.com/callback
   - Test with: https://our-app.com.evil.com/callback (subdomain attack)
   - Test with: https://our-app.com/callback?extra=../../../evil
   - Expected: all rejected with 400 (invalid redirect_uri)
2. Verify the whitelist exists and is strict (not prefix-based)

Format as executable test cases with exact HTTP requests and 
expected responses. These are authorized tests against our own staging.

Learning Tip: Running performance and security test case generation in a single AI session has a practical benefit beyond efficiency: AI can flag endpoints that appear in both workstreams and suggest combined coverage. After generating your performance and security test cases, add one final prompt: "Review all test cases generated in this session and identify any endpoints tested for BOTH performance and security. For those endpoints, suggest whether the security test can be embedded in the load test scenario, or whether they must remain separate." A load test that simultaneously checks for IDOR responses at scale is more efficient — and catches concurrency-dependent vulnerabilities that single-user security tests miss.


How to Interpret Audit Results and Generate a Findings Report with AI?

Audit execution generates a large volume of raw data: k6 JSON summaries, security test execution logs, AT testing notes, accessibility scan exports. Converting this raw data into a structured findings report that clearly communicates risk, evidence, and recommended action is a synthesis task that AI handles well — provided you give it complete, structured input.

Collecting and structuring results for AI analysis

Before generating the report, organize your findings data into a consistent structure:

Performance findings format:

PERF-001
Component: inventory-service
Endpoint: POST /api/inventory/{id}/reserve
Test: Concurrency test, 200 simultaneous VUs
Observed: 112 successful reservations for 100 available units
Expected: Maximum 100 successful reservations
Severity: Critical (oversell risk = revenue loss + customer trust)
Evidence: k6 teardown log shows reserved=112, available=-12
Reproduction: Run concurrency_test.js with 200 VUs in staging

Security findings format:

SEC-001
Component: auth-service
Vulnerability class: OAuth CSRF (OWASP A01 — Broken Access Control)
Test: OAuth callback with missing state parameter
Observed: 200 response, account linked successfully (state not validated)
Expected: 400 Bad Request — invalid or missing state parameter
Severity: High (account takeover vector if user can be phished 
  to initiate OAuth flow)
Evidence: curl output showing 200 on stateless callback
Reproduction: oauth_csrf_test.sh in /tests/security/auth/
CVSS: 7.1 (AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:L/A:N)

Generating the findings report with AI

Generate a formal audit findings report from the results of our 
Black Friday pre-release performance and security audit.

## Audit Context
- Application: ShopCo e-commerce platform
- Scope: Order service, inventory service, auth service (OAuth addition)
- Audit period: 10 days
- Testers: 2 QA engineers
- Environment: Staging (production-equivalent configuration)

## Performance Findings

PERF-001: [paste structured finding]
PERF-002: Checkout p95 at 600 VU = 4.2 seconds (SLO: 2 seconds)
  - Component: order-service checkout endpoint
  - Root cause identified: DB connection pool exhaustion (max 100 connections)
  - 600 VU load: pool exhausted at ~380 VUs
  [additional finding details]

PERF-003: Recommendation service adds 340ms to product page p50
  - Component: recommendation-service
  - Root cause: ML API call not cached for new users (cache miss rate: 65%)
  [additional finding details]

## Security Findings

SEC-001: [paste structured finding — OAuth CSRF]
SEC-002: Inventory reservation endpoint missing rate limiting
  - Any authenticated user can send unlimited reservation requests
  - No IP-based or user-based rate limiting observed
  - Risk: Enables inventory hoarding attack (bot buys all inventory 
    of popular item, releases before TTL expires)
  [additional finding details]

SEC-003: User data export includes another user's order item references
  - Test: user_a requests data export, finds order item IDs from user_b
  - Root cause: ORM query missing tenant isolation filter
  - Severity: Critical (GDPR violation, data exposure)
  [additional finding details]

## Accessibility Findings (included in audit scope)
ACCS-001: Payment form card number input missing programmatic label
ACCS-002: Cart item removal not announced to screen readers

Generate a findings report with:
1. Executive summary (1 page, non-technical, with release recommendation)
2. Findings summary table: ID, title, severity, component, status 
   (open/fixed/accepted)
3. Detailed finding sections (one per finding): description, evidence, 
   impact, recommendation, remediation guidance
4. Risk matrix: plot all findings by likelihood × impact
5. Release recommendation with conditions: 
   "Release is recommended only if findings X, Y, Z are resolved. 
   Findings A, B, C are accepted risk with monitoring mitigations."

Format as a professional audit report in markdown.

Generating fix validation test cases from findings

Every resolved finding needs a regression test. AI generates these from the finding descriptions:

Finding SEC-001 (OAuth CSRF) has been fixed. The development team 
has added state parameter validation to the OAuth callback handler.

Generate:
1. A regression test that verifies the fix (state validation now rejects 
   missing or tampered state)
2. A positive test that confirms legitimate OAuth flows still work 
   (valid state accepted)
3. A CI-suitable version of both tests that can run in < 30 seconds 
   without external OAuth provider dependency 
   (hint: the OAuth callback endpoint can be tested with a mocked 
   state/code exchange — no real Google login needed)

Learning Tip: The hardest part of writing a good findings report is the executive summary — not the technical findings. Developers can read PERF-001 and know what to do. The VP of Engineering and Head of Product need to understand "should we release on Black Friday?" in plain language with a clear recommendation. When you use AI to generate the executive summary, always include a specific instruction: "The executive summary must end with one of three explicit recommendations: (1) Release approved — all blocking risks resolved, (2) Release approved with conditions — the following must be resolved before go-live, or (3) Release blocked — critical risk cannot be mitigated in time. Do not hedge this recommendation." AI summaries without this instruction tend to be non-committal. Stakeholders need a decision, not a description.


How to Prioritize and Hand Off Security and Performance Findings to the Dev Team?

Generating a findings report is the penultimate step. The final step — and the one QA teams most commonly underinvest in — is the handoff. A findings report that sits in a shared drive without action is wasted effort. A well-structured handoff that produces prioritized, developer-ready tickets with acceptance criteria and fix guidance converts audit output into engineering work.

The prioritization framework

Not all findings can be fixed before release. The prioritization decision requires balancing three factors:

Factor Performance Security
Severity Revenue loss, SLA breach, system crash Data exposure, account compromise, GDPR violation
Likelihood Load probability, usage pattern Exploitability, attacker access required
Effort to fix Code change, infra change, config Code change, auth flow change, data migration

Use AI to apply this framework to your specific finding set:

Prioritize these audit findings for the Black Friday release 
(3 weeks away). Classify each as: 

BLOCK (must fix before release — release is blocked otherwise)
CRITICAL (fix before release if time allows, escalate if not)
DEFER (document as accepted risk, add monitoring, fix post-release)
INFORMATIONAL (no immediate action required)

Apply these rules:
- Any finding that enables oversell, account takeover, or 
  GDPR violation is automatically BLOCK
- Any finding where p95 > 2× SLO at expected peak load is BLOCK
- Any finding where the fix requires > 3 days of engineering work 
  AND the severity is Medium or lower is DEFER
- Any finding introduced by a new component (no prior history) 
  gets one severity level higher (precautionary)

Findings to prioritize:
[PERF-001 through PERF-003, SEC-001 through SEC-003, ACCS-001, ACCS-002]

Output the priority classification with rationale for each decision, 
and a total day-estimate for fixing all BLOCK and CRITICAL items.

Generating developer-ready tickets from findings

Convert these audit findings into Jira tickets. Each ticket must be 
self-contained — a developer who picks it up should have everything 
needed to understand, fix, and verify the issue without reading 
the full audit report.

For each ticket:
- Title: [component] [action] [outcome] — e.g., 
  "[inventory-service] Fix race condition causing oversell under concurrency"
- Priority: Blocker/Critical/Major/Minor
- Description: What the bug is, NOT how to fix it 
  (let developers own the solution)
- Steps to reproduce in staging: 
  Exact reproduction commands or test scripts
- Acceptance criteria: 
  The specific test result that proves the fix works 
  (must be measurable — not "works correctly" but "concurrency test 
  with 200 VUs shows maximum successful_reservations = initial quantity")
- QA verification steps: 
  What QA will do to sign off on the fix
- Suggested labels: performance / security / accessibility / gdpr / 
  reliability / [component name]
- Linked regression test: 
  Reference to the test file that should be added to CI

Convert these findings:
[PERF-001, SEC-001, SEC-003]

For SEC-003 (GDPR data exposure): 
This ticket is sensitive — mark it as private/internal only, 
add a note that this finding must not be in public-facing 
changelogs or release notes.

Communicating risk trade-offs to stakeholders

When not all BLOCK findings can be fixed before release, QA must communicate the risk-vs-release trade-off clearly:

We have 3 BLOCK findings. The development team estimates:
- PERF-001 (oversell risk): 2 days to fix
- SEC-001 (OAuth CSRF): 1 day to fix
- SEC-003 (GDPR data exposure): 4 days to fix, plus QA verification

We have 8 working days before the release date.

Generate two options for stakeholder review:

Option A: Fix all BLOCK findings before release (release slips 5 days)
- Pros, cons, and risk analysis
- Mitigation if we slip to a post-Black-Friday date

Option B: Release on schedule with SEC-003 mitigated but not fixed
- What "mitigated but not fixed" means for SEC-003:
  What monitoring, feature flags, or temporary controls 
  can reduce the risk while the full fix is developed?
- The conditions that would trigger an emergency rollback
- The monitoring that must be in place during the release window
- The communication required to legal/compliance team about 
  the known GDPR issue

For Option B, include a post-release remediation timeline for SEC-003.

Post-handoff: tracking fix verification

After tickets are filed, use AI to help you maintain a verification tracking document:

Generate a verification tracking template for our Black Friday 
audit findings handoff.

The template should track, for each finding:
1. Finding ID and title
2. Assigned developer and team
3. Target fix date
4. Fix branch / PR link
5. QA verification date
6. QA verification result (pass/fail/partial)
7. Regression test added to CI (yes/no)
8. Finding status (open / in dev / fixed-pending-qa / verified-closed / 
   accepted-risk / deferred)

Also generate:
- A daily standup prompt I can use to get status updates without 
  reading every ticket: 
  "For each open audit finding, what is the current blocker 
  and what is needed from QA?"
- A pre-release gate checklist: 
  The 10-minute daily check I run in the final week to confirm 
  BLOCK findings are on track for resolution before go-live

Learning Tip: The single highest-leverage thing a QA engineer can do after delivering an audit report is show up to the first developer triage meeting with the tickets already written. "Here is the Jira for PERF-001 with reproduction steps and acceptance criteria" converts a meeting from "what does this audit finding mean?" to "who picks this up and when?" Developers are more likely to take audit findings seriously when they arrive as well-scoped, actionable work rather than dense reports. Use the last hour of your audit engagement to generate the tickets — before you present the report to the team. The audience will ask "what do we do next?" and you will have the answer ready.