Working with Legacy Codebases using AI Assistants

The ability to confidently change code you did not write — and did not understand — is one of the highest-leverage skills an experienced engineer can have, and AI makes it meaningfully more accessible.

Using AI to Understand Unfamiliar Codebases

The first challenge in any legacy codebase is not technical — it is cognitive. Before you can change anything safely, you need a mental model of what the code does, why it does it, and what it silently depends on. That comprehension process, which used to take days of reading and reverse-engineering, can be dramatically accelerated with well-scoped AI prompts.

The most effective comprehension technique is the "explain this function" prompt, used at progressively wider scope. Start at the function level, then the file level, then the module level. At each level you are not trying to memorize the code — you are building a navigational map: which parts matter, which parts are risky to touch, and which parts are vestigial.

Beyond function explanation, you can prompt the agent to generate a call graph in plain text or pseudocode. This is particularly useful for identifying hidden coupling — functions that are called from fifteen different places and therefore carry unspoken contracts with every caller. You can also prompt the agent to identify code smells: long parameter lists, functions with multiple levels of nesting, classes with more than one clear responsibility. These smell reports are not the end of the work, but they are an excellent triage tool for deciding where to invest refactoring effort first.

One important caveat: AI explanation is probabilistic. The agent will sometimes be confidently wrong about what a function does, especially when the code has subtle side effects, relies on global state, or uses unusual patterns. Always validate AI-generated explanations against the tests (if they exist) and against your own reading of critical paths.

Learning tip: When the AI explanation of a function does not match what you expect from reading the code, that discrepancy is a signal — either the code is doing something non-obvious that is worth documenting, or the AI has missed a side effect. Either outcome is useful.

Safe Refactoring Patterns with AI

The cardinal rule of legacy refactoring is: you must be able to tell whether you broke something. That means before you change a single line of production code, you need a safety net. In legacy codebases, that safety net rarely exists. The AI-assisted approach to this problem is to generate characterization tests first.

A characterization test does not test what the code should do. It tests what the code currently does, including any bugs or quirks. Its purpose is to detect regressions, not to enforce correctness. Once you have characterization tests covering a piece of code, you can refactor that code and trust that any behavioral change will be caught by the test suite.

The prompt pattern for this is:

Here is a function from our legacy codebase:

[paste the function and any directly called helper functions]

Write characterization tests for this function using Jest.

Requirements:
- Do not infer what the function "should" do — only test what it currently does
- Cover every branch and edge case you can identify from reading the code
- For any input where the behavior is unclear or seems like a bug, add a comment: "// NOTE: This behavior may be a bug — verify before removing"
- Use describe/it blocks
- Mock any external dependencies (database calls, HTTP calls, file I/O) at the boundary

Once characterization tests are green, you can safely ask the agent to refactor. The key instruction is to preserve behavior, not improve it. Improvements come in a second pass after the refactor is verified.

Learning tip: Run your characterization tests before and after any AI-generated refactor. If the test suite was green before and red after, the refactor introduced a regression regardless of how clean the new code looks. Clean code with a bug is worse than messy code without one.

Instructing Agents to Preserve Existing Behavior

The most dangerous instruction you can give an agent when working with legacy code is "clean this up" or "modernize this" without constraints. Agents default to opinionated improvement: they will rename variables to be more descriptive, collapse conditionals, replace callbacks with async/await, and generally produce code that looks better but may behave differently in edge cases.

The safer instruction pattern is a behavior-preserving refactor prompt with explicit constraints:

Refactor the following function to improve readability.

STRICT CONSTRAINTS — do not violate these:
1. The function signature must not change (same parameters, same return type)
2. The observable behavior must be identical for all inputs, including error cases
3. Do not change error messages or error types thrown
4. Do not change the order of side effects (the function both logs and writes to DB — maintain that order)
5. Do not add new abstractions unless absolutely necessary for readability
6. After refactoring, list every behavior-preserving decision you made and any places where you were uncertain

Here is the function:

[paste function]

Here are the characterization tests it must still pass:

[paste relevant tests]

The final instruction — asking the agent to list its decisions and uncertainties — is not decorative. It gives you a focused review checklist. You do not need to read every line of the refactored output looking for problems; you read the agent's own uncertainty report and verify those specific points.

Learning tip: "Preserve behavior" and "improve behavior" are separate commits. Do not attempt both in the same step. Separating them makes it possible to identify which change introduced a regression during code review or bisect.

The Strangler Fig Pattern with AI Assistance

The strangler fig pattern is a proven approach for modernizing legacy systems incrementally: build the new implementation alongside the old one, gradually route traffic to the new path, and delete the old code only when the new path is fully verified. AI assistance changes the economics of this pattern significantly — what used to require weeks of new-module scaffolding can be done in hours.

The practical workflow is:

Use AI to generate a complete characterization test suite for the legacy component you are replacing.
Use AI to generate the new implementation, grounded in the same type contracts.
Run the characterization tests against the new implementation. Treat any failures as specification questions: is the old behavior a feature or a bug?
Introduce a feature flag or adapter layer to route some requests to the new implementation.
Run both implementations in parallel, logging discrepancies.
Delete the old implementation when the new one has been running without discrepancy for a defined period.

At each step, the AI is assisting with generation and analysis, but you are making the behavioral decisions. The agent cannot tell you whether a quirky output from a legacy function is intentional — that is institutional knowledge only humans (or old tickets) carry.

Learning tip: When using the strangler fig pattern with AI, always generate the characterization tests before touching the legacy code at all. Once you start modifying the legacy code, characterization tests become unreliable — they no longer capture the original behavior.

Common Failure Modes

Understanding how AI-assisted legacy work goes wrong is as important as knowing the techniques. The three most common failure modes:

Aggressive rewrites. The agent generates a beautiful, idiomatic implementation that discards implicit contracts, removes "dead" code that is actually only called from a scheduled job, and renames exported functions. The result compiles and passes unit tests but breaks in production. Prevention: always explicitly forbid signature changes and interface changes in your refactor prompts.

Missed side effects. A function that looks like a pure transformation also writes an audit log entry or increments a metrics counter. The agent refactors away the "clutter" and the side effect disappears silently. Prevention: ask the agent to enumerate all side effects before refactoring ("list every I/O operation, external call, and mutation in this function").

Hallucinated context. When the agent does not have access to the full codebase, it will make assumptions about what helper functions do, what database schemas look like, or how configuration is loaded. Prevention: paste all relevant context explicitly, or use a tool-enabled agent with actual file access.

Learning tip: After any AI-assisted legacy change, do a targeted search for every symbol the agent renamed or removed. Check if it appears anywhere else in the codebase that was not in the agent's context window.

Hands-On: Refactoring a 500-line God Class

This exercise walks through safely refactoring a classic legacy problem: a single class that manages user authentication, session state, email sending, and database writes.

Step 1 — Get comprehension

Here is a 500-line class from our legacy codebase called UserManager.

[paste the class]

Answer the following questions:
1. What are the distinct responsibilities this class has? List each one.
2. What external dependencies does it use (database, email, session store, etc.)?
3. Which methods are only called internally (private or effectively private)?
4. Which methods appear to have side effects beyond their obvious return value?
5. Are there any methods that seem to be dead code or only used in tests?

Do not suggest changes yet. Only describe what is there.

Step 2 — Generate characterization tests

Based on the UserManager class I shared, generate characterization tests for the following methods: login(), logout(), resetPassword(), and updateProfile().

Requirements:
- Jest with TypeScript
- Mock the database client, email service, and session store at the module boundary
- Test the current behavior, including error paths
- Add comments where the behavior looks like it might be a bug
- Do not test private methods directly

File: src/auth/UserManager.characterization.spec.ts

Step 3 — Run and harden the test suite

Run the characterization tests. Paste any failures back to the agent:

The characterization tests produced these failures:

[paste test output]

For each failure, tell me whether it is:
(a) a test that was incorrectly written and needs to be fixed to match actual behavior, or
(b) a test that is correct and reveals behavior that is inconsistent or unexpected

Do not change the production code. Only suggest test corrections.

Step 4 — Identify the split points

Given the distinct responsibilities you identified in UserManager, propose a refactoring plan that splits this class into smaller, single-responsibility classes.

Requirements:
- Each proposed class should have one clear responsibility
- Preserve all existing public method signatures
- Use dependency injection so the new classes can be independently tested
- List the proposed classes, their constructors, and the methods each will own
- Do not generate code yet — only the plan

The existing characterization tests must still be passable after the refactoring.

Step 5 — Execute the refactor incrementally

For each proposed class, generate it separately:

Generate the AuthenticationService class as described in the refactoring plan.

Context:
- Original UserManager class: [paste]
- Characterization tests that must still pass: [paste relevant tests]
- Proposed class plan: [paste plan]

STRICT CONSTRAINTS:
1. All public method signatures must be identical to the original UserManager signatures they replace
2. Do not add new behavior
3. Do not change error messages
4. List any decisions you were uncertain about at the bottom of your response

File: src/auth/AuthenticationService.ts

Step 6 — Wire up and verify

After generating all new classes, create a thin adapter that wraps them and exposes the original UserManager interface. Run all characterization tests against the adapter. Only delete the original UserManager class when every test passes.

Key Takeaways

Comprehension comes before modification: use AI to build a navigational map of unfamiliar code before writing a single change.
Characterization tests are the safety net for all AI-assisted legacy refactoring; generate them before touching production code.
Always instruct the agent to preserve behavior explicitly, list every constraint, and report its own uncertainties.
The strangler fig pattern pairs naturally with AI-assisted generation; let the agent scaffold the new implementation while tests verify behavioral parity.
After any AI-assisted change, manually verify every symbol that was renamed, moved, or removed against the rest of the codebase.