Prompt Architecture for Software Engineers

How you structure a prompt is as consequential as the code you write — poor architecture produces unreliable output at scale, regardless of how capable the model is.

System Prompts vs. User Prompts — Why the Distinction Matters

Every LLM-backed tool you use — Claude Code, Cursor, GitHub Copilot, or a custom agent you build — operates with at least two conceptual prompt layers: the system prompt and the user prompt. Understanding the difference lets you take deliberate control of the model's behavior instead of hoping for the best.

The system prompt is set once, before the conversation begins. It establishes the model's operating context: who it is, what rules govern its behavior, what it knows about your project, and what kinds of responses are acceptable. Think of it as the standing operating procedure a new contractor receives on their first day — it shapes every decision they make without needing to be repeated.

The user prompt is the per-turn instruction. It represents what you want done right now. The model reads the user prompt in light of everything the system prompt established. If your system prompt is weak or absent, the model falls back on generic defaults — and generic defaults mean inconsistent, often mediocre output.

In tools like Claude Code, the system prompt is managed for you. But when you write custom agents, call the API directly, or configure .claude/settings.json, you are writing system prompts. And even in tool-assisted workflows, you are effectively authoring a partial system prompt every time you create a CLAUDE.md file or project-level instructions. Treating these with the same care you would give a critical config file pays dividends across every subsequent interaction.

Learning tip: Before you write any user prompt, ask yourself: "What would I put in a one-page briefing document for a new senior engineer joining this project today?" That briefing is your system prompt. The specific task you want done is your user prompt. Keep them separate in your thinking, even if the tool merges them visually.

Role Framing — Concrete Before/After Examples

Role framing is the technique of explicitly telling the model what perspective, expertise, and responsibilities it should bring to the task. It is not about flattery ("you are an expert") — it is about activating a specific lens that changes how the model reasons and what it chooses to include or omit.

Why it works: LLMs are trained on a vast mixture of text. A plain prompt like "review this code" draws on the average of all code review contexts the model has seen — tutorial-level feedback, pedantic style notes, and actual production concerns all averaged together. A framed prompt narrows the distribution toward the behavior you actually want.

Before (no role framing)

Review this function and tell me if it looks good.

```python
def get_user(user_id: str):
    conn = get_db()
    result = conn.execute(f"SELECT * FROM users WHERE id = {user_id}")
    return result.fetchone()


The model will likely mention the SQL injection risk, but it might also spend half its response on naming conventions and docstring formatting — noise when you need signal.

### After (with role framing)

```prompt
You are a senior backend engineer conducting a security-focused code review. Your primary concern is identifying vulnerabilities that could be exploited in a production environment. Secondary concerns are correctness and error handling. Do not comment on style or naming unless it directly creates a security or correctness risk.

Review this function:

```python
def get_user(user_id: str):
    conn = get_db()
    result = conn.execute(f"SELECT * FROM users WHERE id = {user_id}")
    return result.fetchone()


Now the model leads with the SQL injection vector, discusses the missing input validation, notes the lack of error handling for a missing user or lost DB connection, and omits the style commentary entirely.

Role framing applies equally to generation tasks. "Write a migration script" produces generic output. "You are a database engineer working on a zero-downtime migration for a PostgreSQL table with 50M rows. The application is live and cannot tolerate table locks longer than 100ms. Write a migration script" produces a batched, lock-aware implementation.

> **Learning tip:** Write your role frame as a job description, not a title. "You are a staff engineer" is weaker than "You are a staff engineer responsible for reviewing PRs for correctness, security, and architectural alignment with our event-driven microservices architecture." The more specific the accountability, the more targeted the output.

---

## Structured Instructions — XML Tags, JSON Schemas, and Numbered Lists

Once your role frame is in place, the internal structure of your prompt determines whether the model can reliably parse what you want. Freeform prose works for simple requests. For complex tasks, structure is not optional — it is the difference between a repeatable workflow and a lottery.

### XML Tags for Multi-Part Prompts

XML-style tags are especially effective for separating context from instruction, and for labeling distinct pieces of information the model needs to reason over. Claude in particular responds well to them because they create unambiguous boundaries.

```prompt
You are a senior TypeScript engineer performing a structured code review.

<context>
This is a React component used in our checkout flow. It handles address validation before the user proceeds to payment. The project uses Zod for schema validation and React Hook Form for form state.
</context>

<code>
export function AddressForm({ onSubmit }: { onSubmit: (data: Address) => void }) {
  const { register, handleSubmit } = useForm<Address>();
  return (
    <form onSubmit={handleSubmit(onSubmit)}>
      <input {...register("street")} />
      <input {...register("city")} />
      <input {...register("zip")} />
      <button type="submit">Continue</button>
    </form>
  );
}
</code>

<task>
Identify: (1) missing validation, (2) accessibility issues, (3) error state handling. For each issue, provide the corrected code snippet.
</task>

JSON Schemas for Structured Output

When you need machine-readable output — for example, feeding AI review results into a CI pipeline — specify the output schema explicitly.

You are a code analysis agent. Analyze the following function and return a JSON object that strictly matches this schema:

{
  "severity": "critical" | "high" | "medium" | "low",
  "issues": [
    {
      "type": string,         // e.g., "sql_injection", "null_dereference"
      "line": number,
      "description": string,
      "suggested_fix": string
    }
  ],
  "safe_to_merge": boolean
}

Return only the JSON object. No preamble or explanation.

Function to analyze:
[paste function here]

Numbered Lists for Sequential Tasks

When the task has an inherent order — setup steps, refactoring instructions, migration procedures — numbered lists force the model to reason through each step before moving to the next, reducing the chance of skipping preconditions.

You are a DevOps engineer. Perform the following steps to refactor our Docker Compose setup to use named volumes instead of bind mounts:

1. Identify all bind mount entries in the current docker-compose.yml (provided below).
2. For each bind mount, create an equivalent named volume definition.
3. Replace the bind mount entries with references to the named volumes.
4. Note any bind mounts that map to source code directories — these should be kept as bind mounts and flagged with a comment.
5. Output the complete modified docker-compose.yml.

Current docker-compose.yml:
[paste file here]

Learning tip: Match your structure to the complexity of your task. Single-step tasks: plain prose is fine. Tasks with distinct inputs: use XML tags to label them. Tasks with ordered steps: use numbered lists. Tasks requiring machine-readable output: specify a schema. Mixing all structures into every prompt adds noise without benefit.

Writing Constraints and Acceptance Criteria in Prompts

Constraints are often the highest-leverage part of a prompt. They define the solution space. Without them, the model optimizes for plausibility — which is not the same as correctness for your specific situation.

Effective constraints fall into several categories:

Scope constraints prevent over-engineering. "Modify only the UserRepository class. Do not change the service layer or controller." Without this, the model may helpfully refactor half your codebase.

Technology constraints prevent fantasy solutions. "Use only the libraries already in package.json. Do not introduce new dependencies." The model is very good at recommending new libraries. That is rarely what you want during a targeted fix.

Behavioral constraints define what the output must and must not do. Think of these as your acceptance criteria.

You are a senior backend engineer. Refactor the following authentication middleware to use JWT verification instead of session cookies.

Constraints:
- Do not change the function signature or return type.
- The refactored function must pass all existing tests (listed below).
- Use the `jsonwebtoken` package already in the codebase — do not use any other JWT library.
- Handle token expiration explicitly: return a 401 with the message "Token expired" — not a generic 401.
- Do not log the token value at any point.

Acceptance criteria:
- Valid JWT → middleware calls next()
- Expired JWT → returns 401 with body `{ error: "Token expired" }`
- Missing Authorization header → returns 401 with body `{ error: "No token provided" }`
- Malformed token → returns 401 with body `{ error: "Invalid token" }`

Existing tests:
[paste tests here]

Current middleware:
[paste code here]

This is a prompt you can hand to an AI agent and use its output as a first-pass implementation with confidence. The acceptance criteria map directly to verifiable test cases. The constraints prevent regressions.

Learning tip: Write your acceptance criteria before you write the rest of the prompt, the same way you would write tests before implementation. If you cannot state what "done" looks like in measurable terms, the prompt is not ready yet.

Common Prompt Anti-Patterns Engineers Fall Into

Recognizing these patterns in your own prompts will save you hours of iteration.

Anti-pattern 1: The vague ask. "Clean up this code" or "Make this better" gives the model no signal about what dimension matters. Better or cleaner according to whom? Specify the axis: performance, readability for junior engineers, reduction of cyclomatic complexity, compatibility with a specific runtime version.

Anti-pattern 2: The missing context. Asking the model to fix a bug without providing the error message, stack trace, or the relevant surrounding code. The model will guess — and its guesses are often plausible but wrong. Engineers would not debug without the stack trace; do not ask the model to either.

Anti-pattern 3: The over-prescribed solution. "Implement this feature by creating a new class called UserEventHandler that extends BaseEventHandler and overrides the process method with these exact parameter names..." If you have already designed the entire solution, you are using the model as a typist, not an engineer. Specify the problem and the constraints; let the model propose the implementation.

Anti-pattern 4: The implicit audience. Not specifying who will consume the output. "Write documentation for this API" could produce a README for junior developers, an OpenAPI spec, or a Confluence page for product managers. State the audience and the format.

Anti-pattern 5: The single-shot complex task. Asking the model to design a system, implement it, write tests, and document it all in one prompt. Complex tasks drift — the model loses the thread of the earlier constraints by the time it reaches step four. Break the task into phases, verify each phase, then proceed.

Learning tip: After writing a prompt, read it as if you are a new contractor with no prior context. If you would need to ask a clarifying question, the prompt is missing that information. Add it before sending.

Hands-On: Composing Multi-Turn Prompts for a Complex Task

Multi-turn prompting is the most powerful technique in this module — and the most underused by engineers who treat each AI interaction as a one-shot transaction. Complex engineering tasks are inherently iterative. The prompt architecture should reflect that.

The pattern is: establish context once → decompose into phases → verify before proceeding → accumulate state across turns.

Exercise: Refactoring a Data Access Layer with Multi-Turn Prompts

Scenario: You have a legacy UserRepository class with raw SQL queries scattered through it. You want to refactor it to use a query builder, add input validation, and improve error handling — without breaking the existing API surface.

Step 1: Establish the system context (send this first, confirm understanding before proceeding)

You are a senior backend engineer helping me refactor a legacy data access layer. Here is the context you need to hold throughout this session:

<project_context>
- Language: TypeScript, Node.js 20
- Database: PostgreSQL 15, accessed via the `pg` npm package currently
- Target: Migrate to Kysely as the query builder (already installed, version 0.27)
- Constraint: The public method signatures of UserRepository must not change — callers must not require updates
- Constraint: All changes must be backward-compatible with existing integration tests
- Test runner: Vitest
</project_context>

Here is the current UserRepository class:
[paste class here]

Here are the existing integration tests:
[paste tests here]

Confirm you understand the context and constraints. Do not make any changes yet.

Step 2: Request a refactoring plan before any implementation

Now produce a phased refactoring plan. For each phase:
- State what changes will be made
- Identify which methods are affected
- Flag any risks or edge cases I should verify manually
- Confirm which existing tests cover that phase

Do not write any code yet.

Step 3: Execute phase by phase, verifying after each

Implement Phase 1 only: migrate the `findById` and `findByEmail` methods to Kysely. 

For each method:
1. Show the before and after implementation side by side.
2. Confirm the method signature is unchanged.
3. Show how the existing test for that method still passes without modification.

Do not touch any other methods.

Step 4: Handle edge cases discovered during review

In the migrated `findById` method, I notice you're not handling the case where Kysely throws a connection error differently from a "not found" result. The current implementation returns null for both. Update the method so that:
- Record not found → returns null (unchanged behavior)
- Database connection error → throws a DatabaseError with the original error wrapped as the cause

Show only the updated method. Do not re-show the full class.

Step 5: Verify the accumulated state

We've now completed Phases 1 and 2. Before we proceed to Phase 3, list every method in UserRepository and its current status:
- Migrated to Kysely ✓
- Pending migration
- Not in scope

Flag any inconsistencies between the migrated methods and the original API surface contract.

This multi-turn approach gives you checkpoints at each phase, prevents runaway changes, and keeps the model's attention focused on one concern at a time. The verification steps at turns 2 and 5 are not overhead — they catch drift before it compounds.

Learning tip: Treat multi-turn prompting like a code review process. Each turn is a PR. Before approving (accepting the output and moving forward), verify that the output meets the criteria you set. If it doesn't, iterate in the same turn before advancing to the next phase.

Key Takeaways

System prompts establish standing context; user prompts issue specific tasks. In tools like Claude Code, CLAUDE.md and project instructions function as your system prompt — maintain them with the same discipline you give critical configuration.
Role framing is a precision tool, not a politeness convention. Frame the model's role as a specific job with specific responsibilities, and the output narrows toward production-relevant concerns rather than generic average responses.
Structure your prompt to match your task complexity. XML tags for multi-part inputs, JSON schemas for machine-readable output, numbered lists for ordered procedures — choose the right structure for the job, not the most impressive-looking one.
Constraints and acceptance criteria are the highest-leverage elements of a prompt. They define the solution space. Prompts without explicit constraints produce locally plausible but globally incompatible output.
Multi-turn prompting with verification phases beats single-shot complex prompts every time. Establish context, decompose into verifiable phases, and check before advancing. Complexity that drifts in a single prompt is caught and corrected across turns.