The engineers who get the most out of AI coding agents are not the ones who ask for the most code at once — they are the ones who break the work into verifiable slices and treat every commit as a proof point.
Why Big-Bang AI Generation Fails
There is a tempting pattern that almost every engineer tries at least once with an AI coding agent: hand it a large feature description, walk away, and come back to a complete implementation. The appeal is obvious — let the agent do the entire job while you focus on something else.
In practice, this approach fails in ways that are expensive and often invisible until they compound. A big-bang generation produces a large, difficult-to-review diff. The agent made dozens of decisions during that generation — about naming conventions, data flow, error handling patterns, library choices, edge cases — and you were not present for any of them. Some of those decisions will be subtly wrong. The problems will be woven through the entire codebase change, not isolated in one place.
The second failure mode is recovery. When a big-bang generation goes wrong — and at non-trivial size, it will go wrong in at least one dimension — you have two bad options: attempt to patch an already-complex implementation that you do not fully understand, or throw it away and start over. Neither is efficient. There is no clean rollback point. There is no commit history that tells you when the deviation from the spec happened or what correct state looks like.
The third failure mode is the context cliff. AI agents work within a context window. A multi-hundred-file generation means the agent is reasoning about early decisions without being able to fully attend to late ones. The implementation quality degrades as the agent gets further from the start of the task, and the degradation is hard to detect without reading every file.
Learning tip: Treat the size of an agent's output the way you treat the size of a PR: anything over 400 lines that was generated in a single pass should be treated with heightened scrutiny. The reviewability limit for humans and the reliability limit for agents are not far apart.
The Staged Implementation Model
The alternative to big-bang generation is a structured loop: define a phase, instruct the agent, commit the result, verify against criteria, then begin the next phase. Each iteration of this loop produces a checkpoint — a committed state that is known to be correct and can be returned to without losing work.
A phase should be the smallest slice of the feature that can be verified independently. For most features, this maps naturally to one of: data layer (schema, migrations, repository functions), business logic layer (service functions, domain rules, validations), API layer (route handlers, request/response shapes, middleware), and integration (wiring everything together, connecting to real dependencies, end-to-end tests). Some features need finer slicing; complex integrations may warrant splitting each layer into multiple phases.
The key property of a good phase boundary is verifiability. You should be able to answer "is this phase correct?" without running the full feature. A schema migration either creates the right table or it does not. A service function either handles the happy path and error cases or it does not. A route handler either validates input correctly or it does not. Each of these can be tested in isolation.
Instructing an agent about phase scope matters. Agents will expand scope if they are not explicitly bounded. Left to its own judgment, an agent implementing a database migration may also start writing repository functions, stub out a service, and partially implement a route — because it is "helpful." The result is a half-finished implementation across four layers rather than a complete, tested implementation of one. Explicit scope instructions prevent this drift.
Learning tip: When defining a phase, tell the agent both what to implement and what not to implement yet. "Do not write route handlers in this phase — only the service layer" is as important as "implement the UserService with these three methods."
How to Instruct Agents to Commit Incrementally
A common mistake is to think about staged implementation purely as a human workflow concern — you will run the agent, review the output, and commit when you are satisfied. That approach works, but leaves verification effort entirely on you. The more powerful pattern is to make the staging explicit in your prompts so the agent participates in the checkpoint discipline.
The following prompt structure works consistently for staged implementation:
I am implementing [feature name] in phases. We are on Phase [N] of [total].
Phase [N] scope: [specific layer and functions]
Phase [N] deliverables:
- [specific file or function 1]
- [specific file or function 2]
- [unit tests for each deliverable]
Out of scope for this phase (do not implement these yet):
- [next phase items]
- [further downstream items]
After completing Phase [N], output:
1. The implementation files
2. The test files
3. A brief summary of what was implemented and any decisions you made
4. A commit message for this phase following conventional commits format
Here is the spec: [paste relevant spec sections]
Here is the existing code you must integrate with: [paste integration surface]
This structure does several things at once. It tells the agent exactly what constitutes "done" for this phase, preventing scope creep into later phases. It requires the agent to produce tests as part of the phase deliverable, not as an optional follow-up. It asks for a decision summary, which surfaces any choices you should review before committing. And it produces a commit message, which forces you to think about whether the phase output is actually commit-worthy — if you cannot imagine a meaningful commit message for what the agent produced, the phase was not well-defined.
After each phase, the commit workflow is: run the tests, verify they pass, read the decision summary for anything unexpected, commit. If any part of that fails, you diagnose at the phase level — not across a 2,000-line diff.
Learning tip: Ask the agent to write the commit message for each phase. A good commit message summarizes the change and its purpose in one line. If the agent struggles to write one, the phase scope was too vague or too large. Use that as feedback to sharpen the next phase definition.
Hands-On: Staged Implementation of a Notification Feature
This exercise implements a user notification system in a Node.js/PostgreSQL application across four phases, each ending in a commit. The feature includes a database schema for storing notifications, a service for creating and reading them, a REST API, and background delivery.
Step 1: Define all phases before writing any code
Start by planning the full phase breakdown. Use the agent to help structure it.
I am about to implement a user notification system. Here is the feature spec:
- Users receive in-app notifications for: new follower, post liked, comment received
- Notifications are stored in the database with: id, userId, type, payload (JSON), readAt, createdAt
- API: GET /notifications (list unread), POST /notifications/:id/read (mark as read), POST /notifications/read-all
- Notifications are created by other service calls, not directly by users
- Stack: Node.js, Express, PostgreSQL via Prisma
Break this into implementation phases where each phase:
1. Covers exactly one layer of the stack
2. Can be verified independently with unit or integration tests
3. Results in a commit that leaves the codebase in a valid, non-broken state
For each phase, list: the deliverables, the test approach, and what is explicitly out of scope.
Do not write any code yet.
Expected output: A 4-phase plan — roughly: (1) Prisma schema + migration, (2) NotificationService business logic, (3) REST route handlers, (4) integration tests and wiring. Review this plan and adjust the phase boundaries before proceeding. The plan you agree on is your contract for the rest of the exercise.
Step 2: Execute Phase 1 — data layer
We are beginning Phase 1 of the notification system implementation.
Phase 1 scope: Prisma schema and database migration only.
Deliverables:
- The Notification model added to schema.prisma with fields: id (UUID, default cuid()), userId (FK → User), type (enum: NEW_FOLLOWER | POST_LIKED | COMMENT_RECEIVED), payload (Json), readAt (DateTime, nullable), createdAt (DateTime, default now())
- A Prisma migration file
- A NotificationRepository class in src/repositories/notification.repository.ts with methods: create(data), findUnreadByUserId(userId), markAsRead(id), markAllAsRead(userId)
- Unit tests for NotificationRepository using a test database (not mocked)
Out of scope for this phase: NotificationService, route handlers, any API endpoints.
Here is the current schema.prisma: [paste current schema]
Here is an example of an existing repository file for reference: [paste UserRepository]
After completing, provide:
1. All implementation files
2. Test files
3. A summary of any schema decisions you made (e.g. indexing strategy)
4. A conventional commits message for this phase
Expected output: The Prisma schema addition, a migration file, the repository class with four methods, and tests that hit a real test database. Verify by running npx prisma migrate dev and npm test -- notification.repository. Commit only if both pass.
Step 3: Execute Phase 2 — service layer
After Phase 1 is committed, run Phase 2.
Phase 1 is committed. We are beginning Phase 2 of the notification system.
Phase 2 scope: NotificationService business logic only.
Deliverables:
- NotificationService in src/services/notification.service.ts
- Methods: createNotification(userId, type, payload), getUnreadNotifications(userId), markNotificationRead(id, requestingUserId), markAllNotificationsRead(userId)
- Business rules to enforce:
- createNotification must validate that the userId exists before inserting
- markNotificationRead must verify the notification belongs to requestingUserId (authorization check)
- markAllNotificationsRead must return the count of notifications marked
- Unit tests for all methods with mocked NotificationRepository
Out of scope: route handlers, middleware, HTTP layer.
Here is the NotificationRepository from Phase 1: [paste repository]
Here is the UserRepository for reference (used to validate userId exists): [paste]
Expected output: A service class with four methods and full unit test coverage against the mocked repository. The authorization check in markNotificationRead is the most likely thing to be incomplete — verify it explicitly. Run tests, commit.
Step 4: Execute Phase 3 — API layer
Phase 2 is committed. We are beginning Phase 3.
Phase 3 scope: Express route handlers for the notification API.
Deliverables:
- Route file: src/routes/notifications.ts
- Three routes: GET /notifications, POST /notifications/:id/read, POST /notifications/read-all
- All routes require authentication middleware (already exists as requireAuth — it attaches req.user.id)
- Input validation using Zod for the :id param (must be a valid cuid)
- Error handling: 404 if notification not found, 403 if notification belongs to another user, 400 for invalid input
- Integration tests using supertest that hit a real test database (not mocked service)
Out of scope: background notification delivery, websocket push, email delivery.
Here is an example of an existing route file following our conventions: [paste]
Here is the NotificationService from Phase 2: [paste]
Here is how requireAuth middleware is used: [paste]
Expected output: Route handlers with Zod validation, correct status codes for all error conditions, and integration tests using supertest. Run npm test -- notifications.routes, verify all three routes work, commit.
Step 5: Verify the full feature against acceptance criteria
After all phases are committed, run a cross-phase verification to confirm the feature as a whole meets the original spec.
All three implementation phases of the notification system are committed. Here is the full spec we started with:
[paste original spec]
Review the committed implementation (I will paste the key files below) and verify:
1. Each functional requirement — met, partially met, or not met
2. Each acceptance criterion — pass or fail, with the specific file/function that satisfies it
3. Any requirement from the spec that was not covered in any of the three phases
4. Any implementation decision made during the phases that conflicts with the spec
[paste NotificationRepository, NotificationService, routes/notifications.ts, test files]
Expected output: A structured verification checklist. Any gap surfaces here rather than in production. For gaps that are actual missing functionality, define a follow-up phase. For gaps that are spec ambiguities, update the spec. For conflicts between implementation decisions and the spec, decide which to treat as authoritative and document the decision.
Step 6: Use feature flags for safe staged rollout
If the feature involves a user-facing change, add a feature flag before enabling it in production.
The notification API is fully implemented and passing all tests. Before enabling it in production, add a feature flag that:
- Gates all three notification endpoints behind a flag named NOTIFICATIONS_ENABLED
- Returns HTTP 404 for all notification routes when the flag is false
- Reads the flag from an environment variable (FEATURE_NOTIFICATIONS_ENABLED=true/false)
- Does not require changes to the NotificationService or NotificationRepository
- The flag check lives in the router, not in the controllers or service
Implement the flag wrapper and update the route registration. Do not change any existing tests — add a new test file that verifies the 404 behavior when the flag is off.
Expected output: A feature flag wrapper applied at the router level, with a test that confirms the guarded behavior. This gives you the ability to deploy the code without enabling the feature, verify it in staging, and flip the flag independently of the deployment.
Key Takeaways
- Big-bang AI generation fails because it produces large unreviable diffs, gives you no clean rollback points, and degrades in quality as the context window fills. Staged implementation solves all three problems.
- A phase is valid when it can be verified independently. Map phases to stack layers: data, business logic, API, integration. Split further if a layer is complex enough to make verification ambiguous.
- Explicit scope instructions are mandatory. Tell the agent what is out of scope for each phase, not only what is in scope. Agents expand scope when given the opportunity — bounding them produces cleaner, more focused output.
- Every phase ends with a commit. The commit history becomes your audit trail and your rollback surface. A commit message that you cannot write confidently is a signal that the phase was poorly defined.
- Feature flags separate deployment from release. They give you the ability to merge and deploy staged implementation work without enabling the user-facing change, which decouples your release risk from your development velocity.