Hands-On: Rewrite Prompts | Token Optimization Masterclass

This is a practical hands-on session. You will work through 8 real-world prompt rewrites drawn from common software engineering, QA, and product management workflows. For each prompt, you'll analyze the source of verbosity, apply the techniques from this module, and measure the result. Token counts are approximate and based on GPT-4 tokenization (tiktoken cl100k_base).

The goal is to internalize a repeatable compression process — not just to produce efficient prompts in this session, but to build the habit of applying these techniques to every prompt you write going forward.

The Compression Process

Before working through the examples, internalize this five-step process. Apply it to every prompt in this session and in your production work:

Measure first. Count the tokens in the original prompt.
Categorize each sentence. Label every sentence: essential instruction (E), format constraint (F), context (C), example (X), or filler (N for noise).
Remove all N sentences. Cut anything with no instruction signal.
Compress E and F sentences. Rewrite in imperative voice, use labels and structure, eliminate hedging and redundancy.
Measure again and verify quality. Count the tokens in the rewritten prompt. Confirm the compressed prompt produces equivalent quality on test inputs.

A successful rewrite achieves:
- 40–60% token reduction (good)
- 60–80% token reduction (excellent)
- Same or better output quality

Exercise 1: Customer Support Ticket Classifier

Original prompt — 147 tokens:

You are a helpful AI assistant that works with a software company's customer support team. Your role is to help the support team by reading incoming customer support tickets and classifying them so that they can be routed to the right team.

When you receive a ticket, please carefully read through its entire content to understand what the customer is experiencing or asking about. Then, based on your understanding, classify the ticket into one of the following categories: billing issues, technical problems, account management, feature requests, or general inquiries.

Please provide your classification along with a brief explanation of why you chose that category, so that the support agents can understand your reasoning.

Analysis:

Sentence	Category	Reason
"You are a helpful AI assistant that works with..."	N	Generic role setup; adds no task signal
"Your role is to help the support team..."	N	Obvious from context
"When you receive a ticket, please carefully read..."	N	Model always reads the input
"Then, based on your understanding, classify..."	E	First essential instruction
"into one of the following categories..."	E	Valid label set
"Please provide your classification along with a brief explanation..."	F	Format instruction — but do we need the explanation?

The explanation requirement is worth challenging: if this runs at scale in an automated routing pipeline, no human reads the explanation. The category label is all that's needed.

Rewritten prompt — 32 tokens:

Classify the support ticket into one of these categories. Respond with the category name only.

Categories: billing, technical, account, feature-request, general

Ticket: {ticket_text}

Token reduction: 147 → 32 = 78% reduction

Quality impact: The category label is equally accurate. The explanation is eliminated because it serves no downstream purpose in an automated pipeline. If explanation is needed (e.g., for an agent handoff), add one line: Append a 10-word max reason after a colon. — still far more compact than the original.

Tip: Before writing an explanation requirement into your prompt, ask who reads the explanation and what they do with it. If the answer is "nothing" or "code parses the label only," the explanation is pure waste.

Exercise 2: Code Documentation Generator

Original prompt — 198 tokens:

I need you to help me write documentation for a Python function. You are an expert Python developer with extensive experience writing clear, helpful documentation that other developers will find useful and easy to understand.

Please look at the Python function I'm going to provide you and write a comprehensive docstring for it. The docstring should follow Google style docstring conventions, which means it should include a brief one-line summary at the top, followed by a more detailed description if needed, then sections for Args where you list each parameter with its name, type, and description, a Returns section describing what the function returns including the type, and a Raises section if the function raises any exceptions.

Make sure the documentation is accurate, complete, and helpful to any developer who might need to use this function in the future.

Analysis:

Sentence-by-sentence audit:
- "I need you to help me write documentation..." — N (filler opener)
- "You are an expert Python developer with extensive experience..." — N (adjective stack with no signal)
- "Please look at the Python function..." — N (model will read the input regardless)
- "write a comprehensive docstring for it" — E (core task)
- "The docstring should follow Google style docstring conventions" — E (critical constraint)
- "which means it should include a brief one-line summary..." — E (format details — but these are standard Google docstring format that the model knows)
- "Make sure the documentation is accurate, complete, and helpful..." — N (the model always aims for this)

The format section is the legitimate content. But "Google style docstring" is a phrase the model understands perfectly — it doesn't need the Google format described to it.

Rewritten prompt — 28 tokens:

Write a Google-style docstring for the following Python function. Include Args, Returns, and Raises sections where applicable.

{function_code}

Token reduction: 198 → 28 = 86% reduction

Quality impact: The model knows Google docstring format precisely. The original prompt's description of Google format was teaching the model something it already knows. The rewritten prompt produces equivalent docstrings.

Tip: Avoid describing well-known standards or conventions in your prompts. "Google-style docstring," "PEP 8 compliant," "OpenAPI 3.0 format," "Markdown," "RFC 2822 date format" — these are all recognized by frontier models. Describing them wastes tokens and patronizes the model.

Exercise 3: Sprint Planning Assistant for Product Managers

Original prompt — 232 tokens:

You are an experienced Agile coach and product management expert who has helped dozens of software teams run effective sprint planning sessions. I'm a product manager who needs help prioritizing a backlog of user stories for our upcoming sprint.

I'm going to share with you a list of user stories that are currently in our backlog. Each story has a title, a rough estimate in story points, and some notes about it. What I need you to do is help me figure out which stories should be included in our next sprint.

To do this, please consider the following factors: the business value of each story, the technical complexity as reflected by the story points, any dependencies between stories that might affect the order they need to be completed in, and the overall sprint goal which is to improve the checkout experience for our users.

Please rank the stories in order of priority for inclusion in the sprint, and for each story explain your reasoning so I can understand your recommendations and discuss them with my team.

Analysis:

Essential content:
- Rank backlog stories for sprint inclusion (core task)
- Sprint goal: improve checkout experience
- Consider: business value, story points, dependencies

Non-essential content:
- Role/credentials introduction (48 tokens)
- "I'm going to share with you a list..." (narrative transition)
- "To do this, please consider..." (narrative transition to what are just requirements)
- "explain your reasoning so I can understand" (explanation overhead — worth keeping only if needed for PM to review)

The explanation requirement is legitimate for a PM workflow (unlike the automated classifier). But it can be constrained to be more compact.

Rewritten prompt — 62 tokens:

Rank the following user stories for sprint inclusion. Sprint goal: improve checkout experience.

Ranking criteria (in order of weight):
1. Business value
2. Story points (lower = preferred for tight sprints)
3. Dependencies (unblocked stories first)

For each story: rank number, one-sentence rationale.

Stories:
{backlog_stories}

Token reduction: 232 → 62 = 73% reduction

Quality impact: Maintained. The criteria are now weighted (which the original didn't specify), making the output more actionable. The one-sentence rationale constraint prevents verbose explanations.

Tip: When converting narrative requirements into prompt instructions, extract the actual requirements and reformat them as a structured list with weights or priorities. Narrative form adds connective tissue tokens (transitions, hedges, courtesies) that structure eliminates.

Exercise 4: Automated Test Case Generator (QA)

Original prompt — 276 tokens:

You are a senior quality assurance engineer with many years of experience writing comprehensive test cases for web applications. I need your help generating test cases for a feature that our development team has just finished building.

The feature is described in the acceptance criteria that I will provide to you below. Please read through the acceptance criteria carefully and use them as the basis for the test cases you generate.

For each test case, I would like you to provide the following information: a clear and descriptive test case title that explains what is being tested, detailed step-by-step instructions for how to execute the test, the test data that should be used including specific values for any input fields, the expected result that a tester should see if the feature is working correctly, and any important notes or preconditions that the tester needs to be aware of before running the test.

Please make sure to cover both happy path scenarios (where the user does everything correctly) and negative scenarios (where the user provides invalid input or tries to do something the system should prevent). Also consider edge cases that might not be immediately obvious but could reveal important bugs.

I want these test cases to be written in a way that any QA team member, regardless of their familiarity with this feature, can execute them without needing additional context.

Analysis:

Essential information:
- Task: generate test cases from acceptance criteria
- Required fields: title, steps, test data, expected result, notes/preconditions
- Coverage: happy path, negative, edge cases
- Audience: any QA team member

Non-essential (can be cut):
- Role/credentials sentence
- "I need your help..." opener
- "Please read through the acceptance criteria carefully..." (model reads all input)
- "Please make sure to..." (redundant with explicit coverage list)
- "I want these test cases to be written in a way..." (output quality aspiration, not instruction)
- Multiple instances of "I would like you to" and "I need"

The output format can be expressed as a schema rather than described in prose.

Rewritten prompt — 68 tokens:

Generate test cases from the acceptance criteria below.

For each test case, output:
- Title: <descriptive>
- Steps: numbered list
- Test data: specific values
- Expected result: <exact outcome>
- Preconditions: <if any>

Cover: happy path, negative scenarios, edge cases.

Acceptance Criteria:
{acceptance_criteria}

Token reduction: 276 → 68 = 75% reduction

Quality impact: Identical — all required fields are preserved. The "any QA team member can execute" aspiration is achieved by "numbered list" and "exact outcome" in the format spec, not by describing the audience.

Tip: When your prompt contains field-by-field descriptions of an output format, replace the prose description with a structured template or schema. Bullet lists or JSON schemas communicate format requirements in a fraction of the tokens that prose descriptions require.

Exercise 5: Security Review Prompt (Engineering)

Original prompt — 189 tokens:

Please act as a security expert and carefully review the following code snippet for any potential security vulnerabilities or issues. I need you to be very thorough in your analysis and look for things like SQL injection vulnerabilities, cross-site scripting (XSS) vulnerabilities, authentication and authorization issues, insecure direct object references, sensitive data exposure, security misconfiguration, use of components with known vulnerabilities, and any other security issues you might identify.

For each issue you find, please describe the vulnerability, explain why it's a security risk, describe the potential impact if it were exploited, and provide a specific recommendation for how to fix it. If there are no issues found, please say so clearly.

Please also rate the overall security of the code as low risk, medium risk, or high risk.

Analysis:

Legitimate content:
- OWASP-style vulnerability scan (the list of vulnerability types is legitimate — it guides the model's focus)
- Output format: per-issue findings with specific fields
- Overall risk rating

Reducible content:
- "Please act as a security expert and carefully review..." — weak role framing
- "I need you to be very thorough..." — redundant aspiration
- The vulnerability list is valuable but uses full names where abbreviations work
- "For each issue you find, please describe..." — prose format description

Rewritten prompt — 72 tokens:

Security-review the code below. Check for: SQLi, XSS, auth/authz flaws, IDOR, sensitive data exposure, security misconfiguration, known-vulnerable dependencies.

Output:
- Overall risk: low|medium|high
- Issues (if any): [{vulnerability, risk, impact, fix}]

Code:
{code}

Token reduction: 189 → 72 = 62% reduction

Quality impact: The abbreviated vulnerability names (SQLi, XSS, IDOR) are standard security terminology that frontier models recognize. The JSON-style output schema in the bullet list is more compact and equally clear.

Tip: Security, legal, and compliance domains have well-established abbreviations and shorthand that LLMs understand (SQLi, XSS, GDPR, PII, RBAC, MFA). Use them. You're writing for a machine, not for a non-technical reader. Domain shorthand consistently reduces prompt tokens by 15–25% in specialized professional contexts.

Exercise 6: Product Requirements Summarizer

Original prompt — 244 tokens:

I have a product requirements document (PRD) that I need to summarize for different stakeholders. The PRD is quite long and detailed, and different stakeholders need different levels of detail and different perspectives on the content.

First, I need you to create an executive summary of no more than three sentences that captures the most important business objective, the key user problem being solved, and the expected business impact. This executive summary will be used by C-level executives who need a quick overview.

Second, I need a technical summary for the engineering team. This should be approximately one paragraph (4-6 sentences) and should focus on the key technical requirements, integration points, data flows, and any important constraints or assumptions that will affect the technical implementation.

Third, I need a QA/testing summary that focuses on the key acceptance criteria, the main test scenarios that QA will need to cover, and any important edge cases or non-functional requirements (like performance or security) that will need to be tested.

Please present each summary clearly labeled with its audience.

Analysis:

This prompt contains a legitimate multi-output structure (3 summaries for 3 audiences), but it wraps each requirement in narrative explanation. The essential content is:
- 3-sentence executive summary (business objective, user problem, business impact)
- Engineering paragraph (technical requirements, integrations, constraints)
- QA summary (acceptance criteria, test scenarios, edge cases, NFRs)

The audience context ("C-level executives who need a quick overview") is implied by the label "Executive Summary" and doesn't need explicit explanation.

Rewritten prompt — 65 tokens:

Summarize the PRD in three labeled sections:

**Executive Summary** (3 sentences max): business objective, user problem, expected impact.

**Engineering Summary** (4-6 sentences): technical requirements, integrations, constraints.

**QA Summary** (bullet list): acceptance criteria, key test scenarios, edge cases, NFRs.

PRD:
{prd_text}

Token reduction: 244 → 65 = 73% reduction

Quality impact: The structured label format is clearer to the model than the narrative description. Explicit sentence/item counts constrain output length more effectively than "approximately one paragraph."

Tip: Multi-output prompts (generate several different things) benefit significantly from the header+constraint structure: **Section Name** (constraint): what to cover. This pattern is highly token-efficient, consistently followed by frontier models, and produces predictably structured outputs.

Exercise 7: Agentic Tool Selection Prompt (Engineering)

Original prompt — 312 tokens:

You are an intelligent AI agent that has been given access to a set of tools to help you complete tasks. When you receive a task from a user, your job is to figure out which tools you need to use and in what order in order to complete the task successfully.

Before you start using any tools, I want you to think carefully about what the task requires and make a plan. Consider all of the available tools and think about which ones are relevant to this task and which ones are not. Then decide on the sequence of tool calls that will be most efficient for completing the task.

It's very important that you don't call tools unnecessarily, because each tool call has a cost. Only call a tool when you genuinely need the information or capability it provides. If you already have all the information you need, don't make additional tool calls.

When you execute your plan, please tell me which tool you are calling and why before each tool call. This helps me understand your reasoning and verify that you are using the tools appropriately.

After completing all necessary tool calls and gathering the results, synthesize the information and provide a comprehensive answer to the user's original request.

Remember: only use tools when necessary, be efficient, and always explain your reasoning.

Analysis:

This is a system prompt for an agentic workflow. It has legitimate behavioral constraints (don't over-call tools, plan before acting, explain tool calls) but expresses them in 312 tokens when a fraction would do.

Core behavioral rules:
1. Plan before acting
2. Minimize tool calls (only call when needed)
3. Explain each tool call briefly
4. Synthesize results into a final answer

Rewritten prompt — 58 tokens:

You are an agent with access to tools. For each task:
1. Plan: identify which tools are needed before calling any.
2. Execute: call only necessary tools. State the tool name and one-sentence reason before each call.
3. Synthesize: after all tool calls, provide the final answer.

Do not make redundant tool calls. If you already have the needed information, do not call additional tools.

Token reduction: 312 → 58 = 81% reduction

Quality impact: The numbered plan-execute-synthesize structure is more actionable than the narrative description. The anti-redundancy rule is stated once, concisely.

Tip: Agentic system prompts are paid on every turn of every conversation — they are among the highest-leverage targets for prompt compression. A system prompt used by an agent that handles 10,000 conversations per day with an average of 5 turns each incurs 50,000 system prompt injections daily. Every 100 tokens you remove from the system prompt saves 5 million tokens per day at that scale.

Exercise 8: Full Rewrite Challenge — Combined Prompt

This final exercise presents a prompt that combines multiple anti-patterns. Apply the full compression process yourself before reading the rewrite.

Original prompt — 387 tokens:

You are a helpful, knowledgeable, and experienced software development assistant. You have deep expertise in software architecture, code quality, and engineering best practices across many programming languages and frameworks.

I'm working on a software project and I need your help analyzing a piece of code that one of my team members has written. This code is part of our backend service that handles user authentication.

I would like you to please carefully read through the code I'm going to share with you and perform a comprehensive analysis. Specifically, I need you to do several things:

First, please look at the overall structure and architecture of the code and tell me whether you think it follows good software engineering principles. Does it have good separation of concerns? Is it maintainable? Is it testable?

Second, please look for any bugs or logical errors in the code. These are things that would cause the code to behave incorrectly or produce wrong results.

Third, please identify any performance issues or inefficiencies. Are there any places where the code could be made faster or use fewer resources?

Fourth, please look for security vulnerabilities, particularly those that would be relevant to authentication code, such as improper session handling, weak token generation, insecure password storage, or missing rate limiting.

Finally, please organize all of your findings into a clear structured report that includes separate sections for each of these areas. For each issue you identify, please include a brief description of the issue and a recommendation for how to fix it.

Apply the compression process:

Role sentence: N (generic excellence descriptor)
"I'm working on a software project..." — N (context filler)
"I would like you to please carefully read through..." — N (model reads all input)
Four analysis dimensions: E (architecture, bugs, performance, security) — legitimate requirements
Output format: F (structured report, per-issue description + recommendation) — legitimate

Rewritten prompt — 72 tokens:

Analyze this authentication code across 4 dimensions. Output a structured report:

**Architecture**: separation of concerns, maintainability, testability — issues and recommendations.
**Bugs**: logical errors — what they are and how to fix.
**Performance**: inefficiencies — what and how to optimize.
**Security**: auth-specific vulnerabilities (session handling, token generation, password storage, rate limiting) — findings and fixes.

Code:
{code}

Token reduction: 387 → 72 = 81% reduction

Quality impact: The four dimensions are preserved exactly. The auth-specific security checklist is retained because it's domain-specific guidance, not generic instructions. The output format (four labeled sections, each with findings and recommendations) is clearer as a structured template than as a narrative list of instructions.

Tip: When you reach 80%+ token reduction on a prompt rewrite, you've likely found all the meaningful compression. The remaining 20% is dense, essential signal. Attempting to compress further risks quality degradation. Know when to stop — the goal is optimal efficiency, not minimal character count.

Summary: Compression Techniques Applied

Across these 8 exercises, the average token reduction was 76%. Here is a summary of the techniques applied:

Technique	Applied in exercises	Token savings contribution
Remove role/credential preamble	All 8	High
Remove "please read carefully" + narrative transitions	All 8	Medium
Replace prose format description with structured template	1, 3, 4, 6, 8	High
Use imperative voice	All 8	Medium
Use domain abbreviations	5	Medium
Constrain explanation to one sentence	3, 4, 7	Medium
Use JSON-style field spec instead of prose	4, 5	High
Remove aspiration/quality-assurance statements	2, 4, 6	Low-Medium

Your practice assignment: Select 5 prompts from your current project or workflow. Apply the five-step compression process to each. Target at least 50% reduction on each. Verify quality on a test set of 10–15 representative inputs. Document the before/after token counts and quality scores. This exercise, done once on real production prompts, produces more lasting learning than any number of theoretical examples.

Tip: Establish a team ritual: every new prompt that enters production must go through one peer review specifically focused on token efficiency — not code review, not quality review, but token review. Ask: "Is every sentence earning its tokens?" This review takes 10 minutes and consistently catches the patterns that individual authors overlook because familiarity blinds them to verbosity in their own writing.