Hands-On: Large Scale Refactor Guided by AI

The most dangerous refactors are the ones where you're confident you understand the code — this exercise teaches you how to let AI surface what you don't know you don't know before you move a single line.

Why Large-Scale Refactors Fail (And How AI Changes That)

Refactoring a 600-line route handler is one of the most common and most risky tasks in software engineering. The code works. It's in production. It has accumulated years of edge-case handling, implicit dependencies, and tribal knowledge embedded in its structure. The temptation is to rewrite it from scratch — and that temptation is almost always wrong.

The reason large refactors fail isn't technical. It's epistemic: engineers don't know what they don't know about the existing code. A validation rule tucked inside an if block on line 412. An error response format that a client integration depends on. A subtle ordering dependency between two operations that looks accidental but isn't. These are the things that break after a "clean" rewrite ships to production.

AI doesn't eliminate this problem, but it changes the odds. An agentic tool can read all 600 lines simultaneously, cross-reference behavior across the entire file, and flag the hidden invariants before you start moving things around. Your job as the engineer is to treat AI as a meticulous first reader — not a replacement for understanding the code yourself, but an accelerator for building that understanding.

This module is the Module 5 capstone. You will take a realistic monolithic Express route handler and refactor it into a clean layered architecture using a structured, AI-assisted workflow. By the end, you will have a repeatable process for large-scale refactors in any codebase.

Learning tip: Resist the urge to start refactoring immediately. The most valuable thing AI can do in a large refactor is help you understand the existing code thoroughly before you change it. Spend at least as much time on characterization as on the refactor itself.

The Scenario: A 600-Line Route Handler

The code under refactor is a single Express route handler for POST /api/orders. Over three years it has grown to include: request validation with business rules, pricing calculations with discount logic, inventory checks, database writes across multiple tables, email notification side effects, error handling for 8 different failure modes, and audit logging. All of it is inline in the route handler.

The problems this creates are practical:
- Unit tests require spinning up a full Express app with a database connection
- Adding a new payment method means editing the same 600-line file as adding a new discount type
- A bug in the email notification can silently affect order creation behavior
- The inventory check logic is duplicated in three other endpoints

The target architecture is four layers:
- Route handler: parse request, call service, format response
- Service layer: orchestrate business logic, own transactions
- Repository layer: all database access, no business logic
- Validators: pure input validation, no side effects

Each layer is independently testable. The service layer doesn't know about HTTP. The repository layer doesn't know about business rules.

Phase 1: Characterization — Understanding Before Changing

Before writing a single line of new code, you need a behavioral contract: a complete, precise description of what the current code does. This is what you verify against after the refactor.

Step 1: Generate the characterization document

I need to refactor a large Express route handler. Before making any changes, I want to fully understand what it currently does.

Please read this code carefully and produce a characterization document with these sections:

1. **Request contract**: What fields does it accept? Which are required vs optional? What validation rules apply to each field?

2. **Business logic**: What calculations or decisions does it make? List every business rule you can identify, including any that are implicit in the code structure.

3. **Database operations**: List every database read and write, in the order they execute. Include table names, operation types, and any conditions.

4. **Side effects**: What happens beyond the database? (emails, external API calls, cache invalidations, file writes, etc.)

5. **Response contract**: What does a successful response look like? List every error response, its HTTP status code, and what condition triggers it.

6. **Hidden invariants**: Flag any behavior that looks fragile, implicit, or that might be easy to accidentally break during refactoring (ordering dependencies, implicit state, shared mutable data, etc.)

[paste the 600-line route handler here]

Expected result: A structured document of 3–5 pages that becomes your ground truth. Save this. Every decision in the refactor should be traceable back to it.

Step 2: Generate characterization tests

Characterization tests capture existing behavior — including bugs. They're not tests of what the code should do, they're tests of what it does do. This distinction matters: if your refactor passes all characterization tests, you know you haven't changed observable behavior.

Based on the characterization document you just produced, write characterization tests for this route handler.

Requirements:
- Use supertest + Jest (our existing test stack)
- Cover every response shape you identified (success case + each error case)
- Include at least one test per business rule that affects the response
- For the hidden invariants you flagged, write tests that would catch regressions in those specific behaviors
- Mock external dependencies (database, email service) at the boundary level
- Name each test so it documents what behavior it's capturing, e.g.: "returns 422 when discount_code is expired" not "test discount"

Don't worry about whether the tests are elegant — their purpose is to lock down current behavior, not to be the final test suite.

[paste characterization document]
[paste original route handler]

Expected result: 15–25 test cases that comprehensively cover the current behavior. Run them. They should all pass against the original code. If any fail, you've found a documentation error in the characterization — fix the characterization, then fix the test.

Learning tip: Characterization tests are not about code quality — they're about behavioral safety nets. Write ugly, verbose tests that are unmistakably clear about what they're checking. You'll replace them with proper unit tests later, but they need to survive the entire refactor intact.

Phase 2: Layer Identification and Extraction Plan

With a complete characterization in hand, you can make informed decisions about where the boundaries go.

Step 3: Identify the layers

Given this characterization document and the original code, help me identify the layer boundaries.

I want to extract into:
1. A **validator** (pure input validation, no DB, no business logic)
2. A **repository** (all DB access, no business logic)
3. A **service** (orchestrates business logic, calls repository, triggers side effects)
4. A **slim route handler** (parse request, call service, format response)

For each layer:
- List which lines/functions from the original code belong there
- Flag any code that's ambiguous about which layer it belongs in
- Identify any circular dependencies that would make the extraction complex
- Note any code that needs to be split across layers (e.g., a function that mixes validation and DB access)

Produce a mapping table: | Code section (lines) | Belongs in | Notes |

[paste characterization document]
[paste original code with line numbers]

Expected result: A layer assignment for every major code block. The ambiguous cases are the important ones — discuss them before writing any new code.

Step 4: Create the extraction order

Large refactors need a safe ordering. The general rule: extract from the outside in, test after each extraction.

Given the layer mapping, what is the safest order to extract each layer?

I want an order where:
- After each extraction step, all characterization tests still pass
- No step requires changing more than one thing at a time
- The route handler shrinks incrementally rather than being rewritten all at once

Produce a numbered plan with: what to extract in each step, what the route handler looks like after that step, and what the characterization tests verify at each checkpoint.

Expected result: A 6–8 step extraction plan where each step is independently verifiable.

Learning tip: The extraction order matters more than the final architecture. A plan that gets you to a verifiable checkpoint every 30 minutes is better than a plan that produces the cleanest final structure but requires holding the entire refactor in your head at once.

Phase 3: Extraction — One Layer at a Time

Step 5: Extract the validator

Extract the validation logic from this route handler into a standalone validator module.

Requirements:
- The validator should be a pure function: takes the raw request body, returns either { valid: true, data: ParsedInput } or { valid: false, errors: ValidationError[] }
- No database calls, no business logic, no side effects
- All validation rules from the characterization document must be preserved exactly
- Export a TypeScript type for ParsedInput that represents the validated, type-safe input

After extraction, show me:
1. The new validator module
2. The updated route handler with the validator called at the top
3. Any tests I should add specifically for the validator (the characterization tests cover integration, but I want unit tests for the validator)

[paste characterization document — validation section]
[paste original route handler]

Step 6: Extract the repository

Extract all database access from this route handler into a repository module.

Requirements:
- The repository should expose named methods that describe what they do (findUserById, createOrder, decrementInventory) — not raw query methods
- No business logic in the repository — it only reads and writes data
- All repository methods should accept explicit parameters (no reading from request context)
- Use the same database client that's already in use (Knex)
- The repository should be injectable (accept a db connection in its constructor or factory function) so it can be tested with a test database connection

After extraction:
1. Show the full repository module
2. Show the updated route handler calling repository methods
3. Flag any places where the repository method name was ambiguous or where I had to make a judgment call about what belongs in the repository vs the service

[paste characterization document — database operations section]
[paste current state of route handler after validator extraction]

Step 7: Extract the service layer

This is the most complex step because the service layer owns the business logic that was previously tangled with both HTTP handling and database calls.

Extract the business logic into a service layer. This is the most complex extraction — take extra care to preserve all the behavior documented in the characterization.

Requirements:
- The service should not know about HTTP (no req, res, or status codes)
- The service should accept injected dependencies: the repository and the email service
- The service should return a result object, not throw HTTP errors — use a Result<T, E> pattern or explicit return types
- All business rules from the characterization document must live here
- Transactions should be owned by the service (the service coordinates which repository operations happen atomically)

Pay special attention to:
- The ordering dependencies flagged in the characterization's "hidden invariants" section — preserve them explicitly
- Error cases that have specific messages or codes — map them to typed service errors

After extraction:
1. Show the service module
2. Show the updated (now slim) route handler
3. Show how to wire the dependencies together in the Express app setup
4. List any business rules where you made a judgment call about implementation

[paste characterization document]
[paste validator module]
[paste repository module]
[paste current route handler]

Expected result: A route handler that is now 30–50 lines: parse, validate, call service, map result to HTTP response. All the logic lives in the service and repository.

Learning tip: When the AI extracts the service layer, read every line of the output against the characterization document. The service layer is where implicit behavior most often gets silently dropped. Pay particular attention to error cases and edge-case business rules — they're the first things to disappear in a refactor.

Phase 4: Dependency Injection and Testing

Step 8: Implement dependency injection

Wire up dependency injection for the three layers we've created.

The application currently creates its database connection in a global module. I want the dependency graph to be:
- App startup creates db connection
- Repository is created with db connection
- EmailService is created with config
- OrderService is created with repository + email service
- Route handler is created with order service

Write:
1. The wiring code in the Express app setup file
2. A factory function or class for each layer that accepts its dependencies
3. A test setup helper that creates a fully-wired stack with a test database connection and mock email service

[paste all three layer modules]
[paste current app setup file]

Step 9: Write the final unit test suite

Now that we have clean layers, write a proper unit test suite to replace the characterization tests.

For each layer, write tests that:
- Test the layer in isolation (mock all dependencies)
- Cover the happy path and all error cases
- Test each business rule independently
- Are fast (no real database connections, no network calls)

For the service layer tests, focus on:
- Each business rule from the characterization document
- Each error case and what triggers it
- The transaction behavior (what gets rolled back on failure)

For the repository tests, use a test database (not mocks) to verify the SQL is actually correct.

[paste service module]
[paste repository module]
[paste validator module]
[paste characterization document]

Step 10: Final verification against characterization

Here is the characterization document I produced at the start of this refactor.
Here is the final refactored code across all four layers.

Do a final review:
1. Verify every business rule from the characterization is implemented somewhere in the new code
2. Verify every error case and its HTTP status code is preserved
3. Verify every database operation is present in the repository
4. Verify every side effect is triggered in the same conditions as before
5. Flag any discrepancies — even ones that look like improvements. I want to consciously decide whether to preserve or change behavior, not accidentally change it.

[paste characterization document]
[paste all four layer files]

Expected result: Either a clean bill of health, or a list of discrepancies to review. Any discrepancy should be a conscious decision, not a surprise.

Where Human Oversight Is Non-Negotiable

AI will make mistakes in a large-scale refactor. The following are the highest-risk points that require your explicit attention:

Transaction boundaries. AI often places transaction start/commit in the wrong layer or misses that two operations need to be atomic. Read every transaction-related code the AI generates and verify it against the database operations section of your characterization document.

Error message strings. If your API has clients that parse error messages (legacy integrations, mobile apps, third-party consumers), a changed error message string is a breaking change. AI will often rephrase or improve error messages — always compare them explicitly.

Ordering dependencies. The characterization step should surface these, but AI may still reorder operations that look independent but aren't. If your original code fetches a lock, does a check, then writes — that order is often not accidental.

Silent exception swallowing. When AI rewrites error handling, it sometimes introduces catch blocks that swallow exceptions that should propagate. Read every try/catch in the refactored code.

Type assertions and any casts. AI under pressure to make types compile sometimes inserts as any or as SomeType casts. These are red flags — they indicate a type mismatch that should be resolved properly.

Learning tip: Review the AI's output for a large-scale refactor the same way you'd review a junior engineer's PR: with genuine skepticism and detailed attention to the edge cases. The AI will be right about the obvious structure and wrong about the subtle invariants.

Key Takeaways

Characterization before refactoring is not optional — it's the entire safety net. Without a behavioral contract, you have no way to know whether your refactor preserved correctness.
AI is most valuable in the characterization phase, where it can identify hidden invariants and edge cases across hundreds of lines simultaneously — things that are genuinely hard for a human to hold in working memory.
Extract one layer at a time, run characterization tests after each step. This keeps the blast radius of any mistake small and preserves your ability to bisect a regression.
The service layer is where behavioral correctness is most at risk during AI-assisted refactoring. Read every business rule in the service output against the characterization document.
Human oversight is most critical at transaction boundaries, error handling, and ordering dependencies — these are exactly the places where AI makes the most subtle mistakes.