AI can be your most consistent, tireless code reviewer — if you know how to direct it beyond "does this look okay?"
Why AI Code Review Changes the Game
Most engineers start using AI as a code generator and never realize it can be an equally powerful reviewer. The instinct makes sense: you have something to show, and you want feedback. But unstructured feedback requests — "review this code" — produce generic, surface-level responses. The engineers who get real value from AI review treat it like a specialist they can summon for each dimension of quality: correctness, security, performance, and architectural fit. Each deserves its own prompt, its own framing, and its own criteria.
The case for AI pre-review before human review is practical. Human reviewers are expensive. Their time is better spent on judgment calls: business logic, team conventions, org-specific tradeoffs. If AI catches the null dereference, the missing input validation, or the N+1 query before the review even starts, human reviewers can focus on what only they can see. AI review does not replace human review — it compresses the feedback loop and filters out the noise so that human review becomes higher-signal.
This section walks through how to structure AI review for each quality dimension, how to interpret what comes back, and where the limits are.
Learning tip: Treat AI as a specialist reviewer, not a generalist. One prompt per quality dimension produces better feedback than one prompt for everything.
Reviewing for Correctness
Correctness review asks: does the code actually do what it is supposed to do, under all realistic conditions? AI is well-suited here because correctness failures tend to be structural — off-by-one errors, wrong boolean logic, edge cases that aren't handled, assumptions about input that aren't validated.
When prompting for correctness, give the AI the specification alongside the code. Without the spec, the AI can only check internal consistency ("the code is self-consistent") rather than behavioral correctness ("the code matches what was asked for"). The more concrete the spec, the more precise the review.
Focus your prompt on: edge cases, boundary conditions, error paths, and assumptions. Ask the AI to enumerate assumptions the code makes about its inputs and callers. Many correctness bugs live in those gaps.
You are reviewing the following function for correctness. The function is supposed to [describe the intended behavior, including edge cases].
Here is the code:
[paste code here]
Please:
1. List every assumption this code makes about its inputs, callers, and environment.
2. Identify any edge cases or boundary conditions that are not handled.
3. Check whether error paths return or throw consistent types.
4. Flag any logic bugs — wrong conditionals, off-by-one errors, incorrect boolean operators.
5. Do not suggest style improvements. Focus only on behavioral correctness.
Expected output: a numbered list of specific issues, each with a line reference and an explanation of the failure mode. If the AI says "this looks correct," push back with: "List all assumptions this code makes and identify which ones are not validated."
Learning tip: If you only give AI the code and not the spec, it will review for internal consistency, not correctness. Always include the intended behavior in your correctness prompts.
Reviewing for Security
Security review is where structured AI prompting pays off most visibly. Security bugs follow well-documented patterns — injection, insecure deserialization, missing authentication checks, improper cryptography, path traversal — and AI has strong training signal on all of them. What it needs from you is context: what is this code doing, who calls it, what data flows through it.
The most common mistake in AI security review is asking too broadly. "Is this code secure?" produces a checklist that doesn't map to your code. Instead, tell the AI the attack surface: this is a public API endpoint, it accepts user input, it writes to a database. Now ask it to enumerate threats to that specific surface.
For backend code, always include context about: authentication state (is the caller authenticated?), data source (is this input user-controlled?), and output destination (does this go to a database, file system, HTML output?).
You are a security engineer reviewing the following code for vulnerabilities. Here is the context:
- This is a [public API endpoint / internal service / CLI tool]
- The caller [is authenticated / is unauthenticated / may be either]
- User-controlled input enters at: [describe entry points]
- Data is written to: [database / file system / external API / rendered in HTML]
Here is the code:
[paste code here]
Review for the following threat categories:
- Injection (SQL, command, LDAP, etc.)
- Authentication and authorization bypass
- Insecure direct object reference (IDOR)
- Missing input validation or sanitization
- Sensitive data exposure (secrets in logs, responses, or errors)
- Path traversal or unsafe file access
- Cryptographic misuse
For each finding: state the vulnerability class, describe the attack scenario, identify the specific line(s), and recommend a fix.
Learning tip: Security review requires attack surface context. Tell the AI who calls the code, what data enters it, and where output goes. Without this, you get a generic checklist, not a real threat model.
Reviewing for Performance
Performance review is more nuanced because performance problems are context-dependent — what matters at 100 req/s is different from 100,000 req/s. AI is useful here for spotting structural inefficiencies: algorithmic complexity issues, unnecessary work in hot paths, missing indexes, N+1 queries, and synchronous blocking in async contexts.
Frame performance review with the operational context. What is the expected call frequency? What are the data sizes? Is this in a hot path or a cold path? Without this, the AI will flag things that don't matter and miss things that do.
Ask the AI to reason about complexity explicitly. "What is the time and space complexity of this function?" followed by "are there inputs that would make this degrade?" often surfaces problems that a general review misses.
You are reviewing the following code for performance issues. Context:
- This function is called approximately [N times per request / on every page load / once per day]
- Expected input size: [describe typical and worst-case input sizes]
- Runtime environment: [Node.js / Python / JVM / etc.]
- Database: [PostgreSQL / MySQL / MongoDB / etc., if relevant]
Here is the code:
[paste code here]
Please analyze:
1. Time and space complexity — identify the Big-O for the primary operations.
2. Database query patterns — are there N+1 queries? Missing pagination? Unbounded result sets?
3. Unnecessary work — recomputed values, redundant lookups, sync operations blocking an async context.
4. Memory allocation patterns — objects created in loops, large copies, retained references.
5. For each issue found: describe the failure mode at scale, estimate the impact, and suggest a concrete fix.
Learning tip: Performance review without call-frequency and data-size context produces irrelevant feedback. Always tell the AI the operational envelope before asking it to evaluate performance.
Reviewing for Architectural Fit
Architectural fit is the hardest dimension for AI to assess — and the one where human context is most irreplaceable. But AI can still help by checking for structural signals: layering violations (business logic in a repository layer), tight coupling, missing abstractions, inconsistent patterns within the file, and violations of stated architectural constraints you provide.
The key is to give the AI your architectural rules explicitly. Don't assume it knows your team's conventions. If your project follows hexagonal architecture, say so. If services should never call each other directly, say so. If the repository layer must return domain objects, not raw database rows, say so. The more explicit you are about the rules, the more useful the architectural review becomes.
You are reviewing the following code for architectural fit. Here are the architectural rules for this codebase:
- [List your architectural rules explicitly, e.g.:]
- Controllers handle only HTTP concerns (parsing requests, returning responses). No business logic.
- Business logic lives in service classes only.
- The repository layer is the only place that queries the database. Services never call the database directly.
- External API calls are isolated in adapter classes in the /adapters directory.
- Domain objects are plain classes with no framework dependencies.
Here is the code under review:
[paste code here]
Please identify:
1. Any violations of the stated architectural rules.
2. Responsibilities that are in the wrong layer.
3. Tight coupling between components that should be isolated.
4. Missing abstractions — logic that is repeated and should be centralized.
5. Inconsistencies with the patterns described above.
For each issue: name the rule being violated, identify the specific location, and suggest the refactoring.
Learning tip: AI cannot guess your architecture. Spell out your architectural rules in the prompt. A three-sentence description of your layering model produces dramatically better architectural review than leaving the AI to infer structure from the code alone.
What AI Reviewers Miss — and Why Human Review Still Matters
AI review is systematic but context-blind. There are specific categories of issues where AI consistently falls short, and every engineer using AI review should know them:
Business logic correctness. AI can check that your code is internally consistent and syntactically valid. It cannot know whether the discount logic is correct for your pricing model, or whether the access control rule matches the product spec. Business correctness requires a human who knows the domain.
Org-specific conventions. Every engineering team accumulates conventions that live nowhere in writing — how errors are logged, how feature flags are named, which libraries are banned, what "done" means in a PR. AI doesn't know these. Human reviewers catch them.
Subtle security issues in business logic. OWASP-class vulnerabilities are well within AI's capabilities. But authorization bugs that require understanding your permission model — "can a standard user reach this endpoint through this sequence of actions?" — require human judgment about your system's semantics.
Change in behavior vs. change in structure. AI can review a diff for quality. It cannot easily reason about whether a behavioral change is intentional. A human reviewer with context on the ticket knows whether removing that guard clause was deliberate or accidental.
The correct workflow is AI pre-review followed by human review. AI catches the structural issues so human reviewers can focus on the judgment calls.
Learning tip: Build a mental checklist of what AI misses: business logic correctness, org conventions, permission model semantics, and intentionality of behavioral changes. Human review covers these; AI review covers everything else.
Hands-On: Full Four-Dimension Code Review
This exercise walks through reviewing a single code file across all four dimensions using structured prompts.
Step 1: Choose a real file from a current project. Pick something that has a mix of concerns — ideally a service method or API handler, 50–200 lines. Don't use a toy example; the exercise has more value with real code.
Step 2: Run the correctness review prompt. Use the correctness prompt from the section above. Paste the function's intended behavior from your ticket or spec comment.
Review the following function for correctness only. The intended behavior is: [describe from spec].
List all assumptions the code makes about inputs and callers. Identify unhandled edge cases and error path inconsistencies.
[paste code]
Step 3: Run the security review prompt. Add the attack surface context — who calls this, what data enters, where output goes.
Step 4: Run the performance review prompt. Include call frequency and expected data size.
Step 5: Run the architectural review prompt. Write down your top three architectural rules before prompting. This forces you to articulate what you actually believe about the codebase's structure.
Step 6: Consolidate the findings. Collect all AI-generated findings into a single list. Triage them: critical, should-fix, nice-to-have.
Step 7: Add your own human review layer. Go through the consolidated list and add anything the AI missed: business logic correctness, convention violations, intentionality of behavioral changes.
Step 8: Compare. How many total issues did you find? How many came from AI review vs. your human review? In most cases, AI will surface 60–80% of the structural issues, and human review adds the semantic and contextual ones.
Learning tip: Running all four dimensions on the same file takes 10–15 minutes but produces a review depth that would take a human 45+ minutes to match on the structural dimensions alone. Use this as your standard pre-PR process.
Key Takeaways
- Use separate, structured prompts for each review dimension — correctness, security, performance, and architectural fit. One general "review this" prompt produces generic output.
- Security and performance reviews require operational context (attack surface, call frequency, data size) to produce useful findings. Always include this context in your prompts.
- Architectural review requires you to state your rules explicitly. AI cannot infer your team's conventions from the code alone.
- AI review consistently misses: business logic correctness, org-specific conventions, permission model semantics, and the intentionality of behavioral changes. Human review covers these gaps.
- The optimal workflow is AI pre-review across all four dimensions, followed by a focused human review on business context and org-specific judgment calls.