Inheriting AI-generated code without auditing it is not technical debt — it is an unexamined liability, and the interest accrues in production.
The Scenario
Your team has inherited an Express.js REST API — a user management service — that was generated by an AI agent during a rapid prototyping sprint. The original author has moved to another project. The service handles user registration, authentication, profile updates, and account deletion. It is currently deployed to a staging environment and is scheduled to go to production in two weeks.
You have been asked to assess it. There is no test coverage report. There is no security review. The original specification is in a Confluence page that was written after the code was generated, not before. This is the scenario that Module 6 has been preparing you for.
This capstone exercise walks through a complete audit workflow: from the initial assessment through hallucination scanning, logic verification, security review, test quality analysis, and remediation. At each stage, you will use AI tooling to help you audit AI-generated code — which requires a specific kind of discipline to do effectively.
Learning tip: When auditing AI-generated code, your first instinct might be to start reading the code from the top. Resist it. Start with the specification gap analysis. Understanding what was supposed to be built before you look at what was actually built gives you a critical frame that reading code cold cannot provide.
Stage 1: Initial Audit — Spec vs. Reality
Before touching the code, establish what the service was supposed to do and map that against what actually exists.
Read the specification first. Identify the endpoints that should exist, the data model, the authentication mechanism, the error response format, and any explicit business rules. Write these down as a checklist.
Inventory the implementation. List every route, every middleware, every model, and every utility function. Cross-reference with the checklist. Mark each item as: present and plausible, present but unclear, or missing.
The gaps between the spec and the inventory are your first risk register. Missing endpoints might mean unimplemented features. Present-but-unclear items are where hallucinations and logic errors are most likely to live. Items in the implementation that are not in the spec at all warrant scrutiny — they may be legitimate supporting code, or they may be AI-generated additions that were never requested and have never been reviewed.
A useful prompt for this stage:
Here is the specification for a user management REST API:
[paste the spec]
Here is a directory listing and route summary of the actual implementation:
[paste directory tree and route list]
Perform a gap analysis. List:
1. Endpoints specified but not implemented
2. Endpoints implemented but not in the spec
3. Data model fields in the spec that are missing from the implementation
4. Any middleware or utilities in the implementation that have no clear mapping to a spec requirement
For each gap, note whether it is likely a missing feature, an extra addition, or ambiguous.
Learning tip: The spec-vs-reality gap is also a conversation starter with stakeholders. Before the security or quality audit, resolve ambiguities with the product owner. Auditing code against the wrong spec wastes everyone's time.
Stage 2: Hallucination Scan — Verify That What Was Generated Actually Exists
AI-generated code frequently references packages, methods, and APIs that do not exist, are deprecated, or work differently from how the code assumes. In a user management service, this commonly appears as: npm packages that were real at some point but have been deprecated or renamed, Express middleware used with an incorrect API, database ORM methods called with wrong signatures, and third-party service integrations (email, SMS, OAuth) that use outdated SDKs.
Scan all package.json dependencies. For each dependency, verify it exists on npm, is not deprecated, and the version specified is real. Pay special attention to security-sensitive packages: JWT libraries, password hashing libraries, OAuth clients.
Verify every external method call. For any call to a third-party library — ORM queries, email sending, token generation — look up the current SDK documentation and confirm the method signature matches what the code assumes.
Check for invented methods on standard objects. AI-generated code sometimes calls methods that do not exist on built-in types: Array.prototype.unique(), String.prototype.truncate(), Date.prototype.toRelative(). These will fail at runtime but pass static analysis and code review if no one checks.
Review the following code for hallucinated API calls. For each external library call (third-party npm packages, Node.js built-ins, browser APIs), identify any method calls that:
1. Do not exist in the library's current public API
2. Exist but have a different signature than shown
3. Are deprecated and should not be used in new code
4. Are used in a way inconsistent with the library's documented behavior
[paste the code section]
For each issue found, cite the correct API or suggest the replacement.
Learning tip: Hallucination scanning is most efficient when done package by package rather than line by line. For each package in the import statements, review all calls to that package in a single pass. Switching context between packages slows you down and makes it easier to miss inconsistencies within a single library's usage.
Stage 3: Logic Correctness — Trace Through Key Flows
With the spec gap documented and hallucinations catalogued, move to logic verification. In a user management service, the flows most likely to contain logic errors are: registration (duplicate email handling, password requirements, email verification), authentication (token generation, expiry, refresh), profile update (authorization — can this user update that profile?), and account deletion (cascade behavior, data retention requirements).
For each critical flow, apply the manual trace technique from Topic 4: walk through the code with a representative input, then stress-test the boundaries.
Pay particular attention to authorization logic. AI-generated authorization checks commonly have one of three failure modes: they check that the user is authenticated but not that they are authorized to act on a specific resource; they implement the check correctly but in the wrong middleware order (after the operation rather than before); or they short-circuit on a condition that looks like a security check but actually grants broader access than intended.
Here is the route handler for `PATCH /users/:id` (profile update):
[paste the handler code]
Analyze this handler for authorization correctness. Specifically:
1. At what point in the request lifecycle is the authorization check performed?
2. Does the check verify that the authenticated user is the owner of the :id resource, or only that the user is authenticated?
3. Is there any path through this handler where an authenticated user could update a different user's profile?
4. What happens if the :id parameter is manipulated (e.g., an integer overflow, a negative number, a non-numeric string)?
Walk through the logic step by step and identify any authorization bypass or unexpected behavior.
Learning tip: For authorization bugs specifically, test with two user accounts: User A tries to modify User B's resource. This single test scenario catches the most common AI-generated authorization failure — checking authentication without checking ownership — that static analysis and unit tests with a single user will miss entirely.
Stage 4: Security Audit — OWASP Top Issues in AI-Generated Code
AI-generated code has predictable security failure patterns. The following OWASP-aligned issues appear frequently in AI-generated Express.js services and should be checked explicitly.
Injection vulnerabilities. Look for any SQL query construction using string concatenation or template literals with user input. In ORM-based code, look for raw query escapes where the ORM's parameterized interface should be used instead. In NoSQL code, look for MongoDB query objects built from user-supplied JSON without validation.
Broken authentication. Check: JWT secret is not hardcoded in the source code; token expiry is set and enforced; refresh token rotation is implemented if refresh tokens are used; password hashing uses bcrypt or argon2 with an appropriate work factor (not MD5, SHA-1, or SHA-256 for passwords).
Sensitive data exposure. Check: password hashes are never included in API responses; tokens are never logged; personal data fields (email, phone, address) are not returned in endpoints that do not require them.
Security misconfiguration. Check: CORS is not set to * for credentialed requests; rate limiting is applied to authentication endpoints; error responses do not include stack traces.
Mass assignment. Check: update handlers use an explicit allowlist of updatable fields rather than spreading the request body directly onto the model.
Perform a security audit of this Express.js user management service focusing on these five areas:
1. Injection: Are there any database queries built with user input outside of parameterized queries or ORM safe methods?
2. Authentication: Is JWT handling implemented correctly? Check secret storage, expiry, and signing algorithm.
3. Data exposure: Do any response objects include fields that should not be returned (password hash, internal IDs, tokens)?
4. Misconfiguration: Check CORS settings, rate limiting on auth endpoints, and error response format.
5. Mass assignment: Do any update endpoints spread the request body directly onto the database model?
For each issue found, rate severity as Critical, High, Medium, or Low, and provide a specific code fix.
[paste the relevant code sections]
Learning tip: Run the OWASP checklist before running any dynamic security scanner. The checklist catches the logical issues (wrong algorithm, missing ownership check) that automated scanners cannot find. Use automated scanning afterward to catch the surface issues the checklist might miss.
Stage 5: Test Quality Audit
An AI-generated test suite can give a false sense of security. High line coverage does not mean the tests are testing the right things. The most common test quality failures in AI-generated suites are: tests that only verify happy paths, tests that mock so aggressively that they test the mock rather than the code, tests that assert on implementation details rather than behavior, and tests that use the same AI-generated logic to verify the code — so if the implementation is wrong, the test expectation is wrong in the same way.
Evaluate the test suite on these dimensions:
Coverage breadth. Are unhappy paths tested? Are error conditions tested? Are boundary conditions tested? A test suite that only tests success scenarios is not a test suite — it is a confirmation that the code does not crash on the example the developer used during development.
Mock fidelity. When the test mocks a database or external service, does the mock behave like the real thing would? Mocks that always return success, never reject, and always return well-formed data will not catch the bugs that matter most in production.
Assertion quality. Do the tests assert on outcomes (what the caller receives and what state changes happened) or on implementation details (which internal functions were called, in what order)? Assertions on implementation details produce tests that fail whenever the code is refactored, even when the behavior is correct.
Review this test file for a user registration endpoint:
[paste test file]
Evaluate the test quality on these dimensions:
1. Breadth: List the cases that are tested. What important cases are missing?
2. Mock fidelity: Do the database mocks simulate realistic failure scenarios (connection error, duplicate key, timeout)?
3. Assertion quality: Do the assertions verify observable outcomes or internal implementation details?
4. False confidence: Are there tests that would pass even if the implementation were broken in a meaningful way?
Then write three additional test cases that cover the most critical gaps you identified.
Learning tip: The best single question to ask about any test suite is: "What bug could exist in this code that all these tests would still pass through?" If the answer is "a lot of bugs," the suite is not earning its maintenance cost.
Stage 6: Remediation — Fix, Document, and Hand Off
After the audit, you have a prioritized list of issues. Remediation follows a specific order: critical security issues first, logic correctness issues second, missing functionality third, test gaps fourth, and code quality last. This order reflects the risk profile: security issues can cause immediate harm, logic errors cause data corruption, missing functionality breaks workflows, and poor tests and code quality are long-term maintenance costs.
For each critical fix, document what changed and why — not just in the commit message but in a brief inline comment or a note in the PR description. Future maintainers (human and AI) need to understand that the change was deliberate and what bug it was fixing.
For test remediation, add tests before fixing code. Write a failing test that demonstrates the bug, then fix the code so the test passes. This documents the bug, confirms the fix, and prevents regression.
I need to fix a mass assignment vulnerability in this Express.js route handler. The current code:
[paste the vulnerable handler]
The issue is that the request body is spread directly onto the user model without field validation.
Write a fixed version that:
1. Uses an explicit allowlist of updatable fields: displayName, bio, avatarUrl
2. Strips any other fields from the request body before the database update
3. Returns a 400 error with a clear message if none of the allowed fields are present in the request
Also write a Jest test that verifies: (a) allowed fields are updated correctly, (b) disallowed fields (like email, role, passwordHash) are silently dropped and not persisted, (c) a request with only disallowed fields returns 400.
For the hand-off, produce a brief remediation summary. This is not a full audit report — it is a one-page document that covers: what the audit found, what was fixed, what was deferred and why, and what monitoring or follow-up is recommended.
Write a one-page remediation summary for a user management service audit. Structure it as:
1. Audit scope (what was reviewed)
2. Critical findings fixed before production (list with severity and resolution)
3. Findings deferred to next sprint (list with severity and rationale for deferral)
4. Remaining risk (what the team is accepting by deploying now)
5. Recommended follow-up actions (specific, actionable, with suggested owners)
Use this data from the audit:
[paste your audit findings]
Learning tip: The remediation summary is a forcing function for prioritization. Writing it requires you to make explicit decisions about what is critical and what is acceptable risk. If you cannot articulate why something is deferred rather than fixed, that is a signal it should be fixed.
Hands-On: Full Audit Walkthrough
Step 1: Generate the target codebase
Ask the AI to generate a deliberately imperfect user management service for you to audit:
Generate an Express.js user management REST API with these endpoints:
- POST /users/register (email, password, displayName)
- POST /users/login (email, password, returns JWT)
- GET /users/:id (return user profile)
- PATCH /users/:id (update displayName and bio)
- DELETE /users/:id (soft delete — set deletedAt timestamp)
Use Express, Mongoose (MongoDB), bcrypt for passwords, and jsonwebtoken for tokens.
Important: Include some realistic but subtle issues — things that might pass a quick review but would cause problems in production. Do not tell me what the issues are.
Step 2: Run the spec-vs-reality check
Based on this specification: [paste the prompt above as the spec]
Inventory the generated code:
- List all routes implemented
- List all middleware applied
- List all fields on the User model
- Note anything in the code that was not in the specification
Then identify gaps between what was specified and what was implemented.
Step 3: Run the hallucination scan
Review the package.json and all import statements in this codebase. For each imported package and each method call on that package, verify:
1. The package exists and is not deprecated
2. The method being called exists in the package's current API
3. The method is being called with the correct arguments
[paste package.json and code]
Step 4: Run the authorization check
For the PATCH /users/:id and DELETE /users/:id endpoints, trace the authorization flow. Answer:
1. Can an authenticated user modify or delete a different user's account?
2. At what point in the handler is the authorization check applied?
3. Is there an ownership verification step, or only an authentication step?
[paste the relevant handlers]
Step 5: Run the security checklist
Audit this Express.js user management service for these specific security issues:
- Is the JWT secret hardcoded or loaded from environment variables?
- Does the login endpoint have rate limiting?
- Does the register endpoint prevent duplicate email registration?
- Does the PATCH handler use mass assignment (spreading req.body directly)?
- Does any response object include the password hash?
- Are database errors returned raw to the client (stack trace exposure)?
For each issue found, provide a severity rating and the specific code change needed to fix it.
[paste the full codebase]
Step 6: Write the missing critical tests
This user management service has no tests for these critical behaviors:
1. A user cannot update another user's profile (authorization bypass)
2. The password is never returned in any API response
3. Registering with a duplicate email returns 409, not 500
Write Jest + Supertest tests for all three cases. Include setup (test database, test user creation) and teardown. The tests should fail against the current implementation if any of these issues exist.
Step 7: Fix the top three issues
For each critical issue identified in the audit:
Fix the [mass assignment / JWT secret exposure / missing rate limiting] issue in this handler:
[paste the vulnerable code]
Requirements for the fix:
- [state the specific requirement]
- Do not change the route signature or response format
- Add an inline comment explaining what the vulnerability was and what the fix does
Also update the relevant test to confirm the fix works.
Step 8: Write the remediation summary
Write a brief remediation summary (max one page) for the user management service audit we just completed. Include:
- What was audited and when
- Issues found and their severity
- What was fixed in this session
- What requires follow-up before production deployment
- Any monitoring recommendations (e.g., alert on 401 spike — possible brute force)
Key Takeaways
- A complete audit follows a specific sequence: spec gap analysis, hallucination scan, logic correctness, security review, test quality, and remediation — doing them out of order misses dependencies between findings.
- AI-generated security issues are predictable: mass assignment, hardcoded secrets, missing ownership checks, and raw error exposure appear frequently enough to justify a checklist rather than ad hoc review.
- Use AI tooling to help audit AI-generated code, but maintain a skeptical stance: ask the AI to identify problems, then verify its analysis against the code yourself rather than accepting it at face value.
- Fix in risk order: security critical issues before logic errors before missing features before test quality. Deploying with known security issues to ship a feature faster is a risk decision that should be explicit, not accidental.
- The remediation summary is as important as the fixes themselves — it creates shared understanding of what was found, what was deferred, and what risk remains, which is the foundation for responsible handoff.