Handling AI Hallucinations in Generated Code

Code hallucinations are more dangerous than text hallucinations because they can silently compile, pass linting, and ship to production before anyone notices something is wrong.

What Is a Code Hallucination?

A hallucination in the context of AI-generated code is not a typo or a syntax error. It is confident, plausible-looking code that references things that do not exist, misrepresents how real things work, or produces logically incorrect behavior in ways that are hard to spot at a glance.

Text hallucinations are annoying. A wrong fact in a blog post is embarrassing. But code hallucinations carry a different risk profile. When a language model invents a function that does not exist in a library, the code often still parses correctly. It may even pass a linter. If you are not running tests or reading the API documentation yourself, the fabrication can travel all the way to a production deploy.

The core problem is that AI models learned code from the internet — documentation, GitHub, Stack Overflow, tutorials — and they pattern-match from that corpus. They do not query a live package registry or execute the code in their heads. When they are uncertain, they do not say "I don't know this API." They generate the most statistically plausible token sequence, which often looks exactly like correct code.

Understanding the taxonomy of hallucinations is the first step toward reliably catching them before they cause harm.

Learning tip: Next time you accept a block of AI-generated code, spend 60 seconds asking: "Did I verify that every method call, import, and configuration key actually exists?" That one habit catches the majority of hallucinations before they land in your codebase.

A Taxonomy of Code Hallucinations

Hallucinations are not a single phenomenon. They fall into recognizable categories, and knowing the category tells you where to look when reviewing.

Non-existent APIs and methods. The model invents a method that sounds like it should exist. For example, a call to fs.readFileAsync() in Node.js (the real method is fs.promises.readFile()), or stripe.charges.createRefund() when the actual Stripe SDK method is stripe.refunds.create(). These are easy to generate and hard to spot by reading the code, because they follow the naming conventions of real methods in the same library.

Wrong function signatures. The method exists, but the model passes arguments in the wrong order, omits required parameters, or adds parameters that do not exist. A common example is calling Array.prototype.splice() with the arguments reversed, or passing a callback where the modern API expects a promise. The call site looks plausible but behaves incorrectly or throws at runtime.

Invented library versions and configuration keys. Models often hallucinate features from a version of a library that does not match what you have installed — or features that never existed in any version. Configuration options are especially prone to this: the model may generate a webpack config with an option that was proposed in an RFC but never shipped, or a Jest config key that was removed in a major version two years ago.

Plausible-but-wrong logic. This is the hardest category to catch. The code runs, produces output, and appears correct for the happy path. But the logic is subtly wrong in edge cases — off-by-one errors in date calculations, incorrect handling of empty inputs, wrong precedence in boolean expressions, or security checks that are bypassed by a specific input. The model learned common patterns, but it did not reason through your specific requirements.

Learning tip: When reviewing AI-generated code, mentally sort what you see into these four categories. If you cannot confidently say "I verified this method exists with this signature," treat it as an unconfirmed claim until you check the source.

Why Code Hallucinations Are Uniquely Dangerous

The reason code hallucinations deserve dedicated attention is the gap between "looks right" and "is right." In most professional writing, a false claim is recognizable — it conflicts with something the reader knows, or it cites a source that can be checked. In code, the gap is much harder to close.

Modern languages and frameworks are large. No engineer has memorized every method in the AWS SDK, the full Stripe API surface, or every React hook's exact signature. We rely on IDE autocomplete and documentation. But when reviewing AI-generated code under time pressure, it is easy to assume the model got the API right and focus your attention on the logic instead.

The build pipeline compounds the risk. TypeScript will catch some non-existent methods if the types are correctly declared. ESLint will catch some patterns. But many hallucinations pass all static analysis tools. An invented npm package name will not cause a build error until npm install fails or, worse, until it resolves to a malicious package with a similar name. A wrong method argument order will not cause a type error if both arguments are the same type.

The practical consequence is that your code review process needs to expand to include verification steps that go beyond reading the code.

Learning tip: Think of AI-generated code the way you would think of an unsigned library from an unknown source. You would not deploy it without inspection. Apply that same skepticism to AI output, especially around external API calls and third-party library usage.

Detection Techniques That Actually Work

Catching hallucinations requires a layered approach. No single technique catches everything, but combining several raises your confidence significantly.

Run the code and read error messages carefully. This sounds obvious, but many engineers review AI-generated code without running it. The fastest way to catch a non-existent method is to call it. Error messages from Node, Python, or the browser console are specific: "TypeError: fs.readFileAsync is not a function" tells you exactly where the hallucination is.

Use TypeScript and strict type-checking as a first filter. TypeScript catches a large share of non-existent properties, wrong argument counts, and wrong return type assumptions. If you are working in a typed language or can add types temporarily, do so before accepting AI-generated code. Not all hallucinations are type errors, but many are.

Grep for method and import names against authoritative sources. If the code imports a package or calls a specific method, open the package's actual documentation or source code and verify that the method exists with that signature. This takes 30–60 seconds per method and is worth doing for every external API call in AI-generated code.

Check package names against the registry before installing. Before running npm install some-package or pip install some-library, verify the exact package name on npmjs.com or PyPI. AI models occasionally invent package names or use outdated names. This is also how typosquatting attacks work — so the habit protects you against both hallucinations and supply chain attacks.

Write or ask for tests before reading the implementation. If you write a test that describes the expected behavior, then run the AI-generated implementation against it, you will catch plausible-but-wrong logic that looks correct in isolation but fails real cases. Test-driven verification is one of the most reliable hallucination-detection techniques.

Learning tip: Build a personal "hallucination checklist" that you run on every block of AI code before merging it. Start with: (1) Does every imported package exist under this exact name? (2) Does every method call match the actual API signature? (3) Did I run the code at least once? Three checks, three minutes, most hallucinations caught.

Specific Hallucination Patterns to Watch For

Beyond the general taxonomy, there are recurring specific patterns that appear frequently in AI-generated code.

Fabricated npm/PyPI package names. The model generates an import for react-use-form-validation or axios-retry-interceptor — plausible-sounding names that do not exist. Always verify the package name before installing.

Wrong SDK method paths. AWS SDK v3 has a completely different import and call structure from v2, and models frequently mix them. Similarly, the Firebase v9 SDK has a functional API that looks nothing like v8's class-based API. When working with major version changes in SDKs, expect the model to get confused.

Invented configuration options. Webpack, Babel, ESLint, and Jest configs are hallucination hotspots. Models generate config keys that sound plausible but either never existed or were removed. Always cross-reference generated config against the current version's documentation.

Date and time math errors. Off-by-one errors in date arithmetic are extremely common. The model may get the general approach right but get the boundary conditions wrong — treating months as 1-indexed vs. 0-indexed in JavaScript's Date object, for example.

Security check bypasses. Authentication and authorization logic generated by AI is particularly worth scrutinizing. The model may generate a check that looks correct but fails for a specific input — for example, an if (user.role === 'admin' || user.isAdmin) where one path can be spoofed.

Learning tip: Keep a personal log of hallucinations you find. After a few weeks, you will notice patterns — the model makes similar mistakes in the libraries you use most. That pattern awareness makes you faster at catching the next one.

Prompt Techniques to Reduce Hallucination Frequency

While you cannot eliminate hallucinations by prompting alone, you can significantly reduce their frequency.

Providing the model with the exact API documentation or type signatures you want it to use is the single most effective technique. When the model has authoritative context in front of it, it is much less likely to confabulate.

Asking the model to cite which methods it used and where to find them in the documentation creates a self-checking loop. The model may still hallucinate, but it will do so less confidently, and the citation gives you a specific place to verify.

Breaking large tasks into smaller, more specific prompts reduces the opportunity for hallucination. The larger and vaguer the request, the more the model has to fill in from pattern matching rather than from the context you provided.

Learning tip: Always paste the relevant documentation snippet or type definition into your prompt when asking the AI to use a specific library or API. This one habit eliminates a large share of the "invented method" category of hallucinations.

Hands-On: Catching Hallucinations in a Real Code Block

This exercise gives you a structured workflow for reviewing AI-generated code for hallucinations before merging.

Step 1: Generate a realistic block of AI code.

Use this prompt to generate code you will then audit:

Write a Node.js function that:
1. Reads a JSON file from disk asynchronously
2. Validates that it has a "users" array
3. Filters users where isActive is true
4. Returns the filtered array

Use the fs/promises module and async/await. Handle file-not-found errors with a specific error message.

Step 2: List every import and external method call.

Before running anything, read the generated code and write down:
- Every import or require statement
- Every method call on an external module (e.g., fs.readFile, JSON.parse)
- Every configuration key or option object passed to a library function

Step 3: Verify each item against authoritative documentation.

For each item in your list, open the official documentation and confirm:
- The package exists under that exact name
- The method exists on that object
- The arguments match the documented signature
- You are looking at docs for the version you actually have installed

Step 4: Run the code with a valid input.

Create a small test JSON file with the expected structure and run the function. Note whether it returns the correct result.

Step 5: Test edge cases explicitly.

Run the function with:
- A file that does not exist
- A JSON file that has no users key
- A users array where all users have isActive: false
- A users array that is empty

Note whether each case behaves as described in the prompt.

Step 6: Ask the AI to find its own hallucinations.

Review the following code for potential hallucinations — places where you may have used a method, argument order, or API that does not match Node.js fs/promises documentation exactly. List any items you are not fully confident about.

[paste the generated code here]

Step 7: Compare the AI's self-review to your manual audit.

Note which issues the AI identified and which ones it missed. This comparison is instructive: it shows you the reliability ceiling of AI self-review and calibrates how much independent verification you still need to do.

Step 8: Correct and re-test.

Fix any confirmed hallucinations, re-run all edge case tests, and confirm the code behaves correctly for all inputs you tested.

Key Takeaways

Code hallucinations fall into four main categories: non-existent APIs, wrong function signatures, invented configuration options, and plausible-but-wrong logic. Each requires a different detection approach.
The danger of code hallucinations is that they pass static analysis and even code review when reviewers are moving fast. Detection requires deliberate verification steps, not just reading.
Layered detection (type-checking, running the code, verifying against documentation, writing tests) is more reliable than any single technique.
Prompt hygiene — providing documentation context, asking for explicit citations, breaking tasks into smaller pieces — reduces hallucination frequency but does not eliminate it.
A personal hallucination log helps you build pattern recognition for the specific libraries and frameworks you work with most often.