Hands-On: Budget Framework | Token Optimization Masterclass

This is a full hands-on lab. You will design, implement, and validate a complete token budget framework for a fictional but realistic software development team. By the end, you will have a working budget system that covers model selection routing, per-task and per-project limits, real-time cost monitoring, alerting, and an ROI dashboard — all integrated into a cohesive framework you can adapt for your own team.

The scenario: Acme Engineering is a 12-person product team (6 engineers, 3 QA engineers, 3 PMs) that runs the following agentic workflows:
- CI/CD pipeline code review agent (automated, runs on every PR)
- Developer coding assistant (interactive, on-demand)
- QA test generation agent (runs on new feature tickets)
- PM requirements drafting assistant (interactive, on-demand)

Their goal: spend no more than $800/month total on AI tokens while maximizing productivity for all three roles.

Phase 1: Map Your Workflows and Estimate Baseline Costs

Before writing any code, document every agentic workflow and estimate its token consumption.

Step 1: Workflow Inventory

Create a workflow inventory document (this is your starting artifact):

ACME ENGINEERING — AI WORKFLOW INVENTORY

1. CI/CD Code Review Agent
   Trigger: Every PR opened or pushed
   Frequency: ~40 PRs/week, ~160/month
   Task: Review code changes for bugs, style, security issues
   Avg code diff size: 300 lines ≈ 3,500 input tokens
   System prompt: 800 tokens
   Tool definitions: 400 tokens
   Expected output: 500-800 tokens (review comments)
   Model candidate: claude-3-5-haiku-20241022

2. Developer Coding Assistant
   Trigger: On-demand, all 6 engineers
   Frequency: 12 sessions/engineer/week × 6 = 72 sessions/week = 288/month
   Avg session: 8 turns, 1,200 tokens per turn (in+out combined)
   System prompt: 600 tokens (stable, cacheable)
   Model candidate: claude-3-5-haiku-20241022 (routed to Sonnet for complex)

3. QA Test Generation Agent
   Trigger: New feature ticket in Jira
   Frequency: 8 features/sprint × 2 sprints/month = 16/month
   Task: Generate test plan + test cases from requirements
   Avg input: 2,000 tokens (requirements doc + existing tests)
   Avg output: 3,000-4,000 tokens (test plan + cases)
   Model candidate: claude-3-5-sonnet-20241022

4. PM Requirements Drafting Assistant
   Trigger: On-demand, all 3 PMs
   Frequency: 4 sessions/PM/week × 3 = 12/week = 48/month
   Avg session: 6 turns, 800 tokens per turn
   System prompt: 500 tokens
   Model candidate: claude-3-5-haiku-20241022

Step 2: Calculate Baseline Cost Estimate


PRICING = {
    "claude-3-5-sonnet-20241022": {"input": 3.00, "output": 15.00},
    "claude-3-5-haiku-20241022":  {"input": 0.80, "output": 4.00},
}

workflows = {
    "ci_cd_review": {
        "model": "claude-3-5-haiku-20241022",
        "monthly_tasks": 160,
        "avg_input_tokens": 4700,    # system + tools + diff
        "avg_output_tokens": 650,
        "label": "CI/CD Code Review"
    },
    "dev_assistant": {
        "model": "claude-3-5-haiku-20241022",
        "monthly_tasks": 288,        # sessions
        "avg_turns": 8,
        "avg_input_per_turn": 750,   # grows with history, avg mid-session
        "avg_output_per_turn": 450,
        "label": "Developer Coding Assistant"
    },
    "qa_test_gen": {
        "model": "claude-3-5-sonnet-20241022",
        "monthly_tasks": 16,
        "avg_input_tokens": 2000,
        "avg_output_tokens": 3500,
        "label": "QA Test Generation"
    },
    "pm_assistant": {
        "model": "claude-3-5-haiku-20241022",
        "monthly_tasks": 48,
        "avg_turns": 6,
        "avg_input_per_turn": 550,
        "avg_output_per_turn": 250,
        "label": "PM Requirements Drafting"
    }
}

print("=" * 60)
print("ACME ENGINEERING — Monthly Token Cost Estimate")
print("=" * 60)

total_monthly = 0.0

for workflow_id, wf in workflows.items():
    pricing = PRICING[wf["model"]]

    if "avg_turns" in wf:
        # Session-based workflow
        total_input = (wf["monthly_tasks"] * wf["avg_turns"] * 
                      wf["avg_input_per_turn"])
        total_output = (wf["monthly_tasks"] * wf["avg_turns"] * 
                       wf["avg_output_per_turn"])
    else:
        # Single-task workflow
        total_input = wf["monthly_tasks"] * wf["avg_input_tokens"]
        total_output = wf["monthly_tasks"] * wf["avg_output_tokens"]

    input_cost = (total_input / 1_000_000) * pricing["input"]
    output_cost = (total_output / 1_000_000) * pricing["output"]
    workflow_cost = input_cost + output_cost
    total_monthly += workflow_cost

    print(f"\n{wf['label']}")
    print(f"  Model:          {wf['model']}")
    print(f"  Input tokens:   {total_input:,}/month")
    print(f"  Output tokens:  {total_output:,}/month")
    print(f"  Input cost:     ${input_cost:.2f}")
    print(f"  Output cost:    ${output_cost:.2f}")
    print(f"  TOTAL:          ${workflow_cost:.2f}/month")

print(f"\n{'=' * 60}")
print(f"TOTAL ESTIMATED MONTHLY COST: ${total_monthly:.2f}")
print(f"Budget:                        $800.00")
print(f"Headroom:                     ${800 - total_monthly:.2f}")
print(f"Budget utilization:           {(total_monthly/800)*100:.1f}%")

Running this produces:

============================================================
ACME ENGINEERING — Monthly Token Cost Estimate
============================================================

CI/CD Code Review
  Model:          claude-3-5-haiku-20241022
  Input tokens:   752,000/month
  Output tokens:  104,000/month
  Input cost:     $0.60
  Output cost:    $0.42
  TOTAL:          $1.02/month

Developer Coding Assistant
  Model:          claude-3-5-haiku-20241022
  Input tokens:   1,728,000/month
  Output tokens:  1,036,800/month
  Input cost:     $1.38
  Output cost:    $4.15
  TOTAL:          $5.53/month

QA Test Generation
  Model:          claude-3-5-sonnet-20241022
  Input tokens:   32,000/month
  Output tokens:  56,000/month
  Input cost:     $0.10
  Output cost:    $0.84
  TOTAL:          $0.94/month

PM Requirements Drafting
  Model:          claude-3-5-haiku-20241022
  Input tokens:   158,400/month
  Output tokens:  72,000/month
  Input cost:     $0.13
  Output cost:    $0.29
  TOTAL:          $0.42/month

============================================================
TOTAL ESTIMATED MONTHLY COST: $7.91
Budget:                        $800.00
Headroom:                     $792.09
Budget utilization:           1.0%

The estimate is well within budget. This is common for teams just starting agentic workflows — the bottleneck is rarely cost, it's adoption. Budget the $800/month to give room for 10x-100x growth in usage as the team adopts these tools.

Tip: Always run your baseline estimate before your first production deployment. The numbers almost always surprise teams — either costs are much lower than feared (enabling faster adoption) or much higher than expected (triggering immediate optimization). Both outcomes are useful.

Phase 2: Implement the Budget Framework

Now build the actual framework. We'll implement this as a Python package that all four workflows can import.

Project Structure

acme_ai_budget/
├── __init__.py
├── config.py          # Budget configuration
├── client.py          # Instrumented LLM client
├── budget.py          # Budget enforcement
├── router.py          # Model selection router
├── monitor.py         # Real-time monitoring
├── alerts.py          # Alerting system
└── report.py          # ROI reporting

config.py — Budget Configuration


from dataclasses import dataclass, field
from typing import Dict

@dataclass
class WorkflowBudget:
    """Budget configuration for a single workflow."""
    monthly_usd: float
    daily_usd: float
    per_task_soft_tokens: int
    per_task_hard_tokens: int
    per_session_soft_tokens: int
    per_session_hard_tokens: int
    max_loop_steps: int = 20
    alert_email: str = "[email protected]"

TOTAL_MONTHLY_BUDGET_USD = 800.00
BUDGET_RESERVE_PCT = 0.20  # Reserve 20% as buffer

WORKFLOW_BUDGETS: Dict[str, WorkflowBudget] = {
    "ci_cd_review": WorkflowBudget(
        monthly_usd=200.00,   # 25% of budget — high frequency, high value
        daily_usd=7.00,
        per_task_soft_tokens=6_000,
        per_task_hard_tokens=12_000,
        per_session_soft_tokens=6_000,    # Single-turn, reuse task limits
        per_session_hard_tokens=12_000,
        max_loop_steps=5,                 # Code review should be quick
    ),
    "dev_assistant": WorkflowBudget(
        monthly_usd=350.00,   # 44% of budget — highest engagement
        daily_usd=12.00,
        per_task_soft_tokens=5_000,       # Per turn
        per_task_hard_tokens=10_000,
        per_session_soft_tokens=60_000,   # Session accumulation
        per_session_hard_tokens=120_000,
        max_loop_steps=30,                # Coding sessions can be long
    ),
    "qa_test_gen": WorkflowBudget(
        monthly_usd=150.00,
        daily_usd=8.00,
        per_task_soft_tokens=8_000,
        per_task_hard_tokens=16_000,
        per_session_soft_tokens=8_000,
        per_session_hard_tokens=16_000,
        max_loop_steps=10,
    ),
    "pm_assistant": WorkflowBudget(
        monthly_usd=100.00,
        daily_usd=4.00,
        per_task_soft_tokens=4_000,
        per_task_hard_tokens=8_000,
        per_session_soft_tokens=40_000,
        per_session_hard_tokens=80_000,
        max_loop_steps=15,
    ),
}

MODEL_ROUTING = {
    "ci_cd_review": {
        "default": "claude-3-5-haiku-20241022",
        "complex": "claude-3-5-sonnet-20241022",  # Large diffs or security issues
        "threshold_tokens": 6_000,  # Escalate if input > threshold
    },
    "dev_assistant": {
        "default": "claude-3-5-haiku-20241022",
        "complex": "claude-3-5-sonnet-20241022",
        "complexity_keywords": ["architect", "design", "system", "performance", "security"],
    },
    "qa_test_gen": {
        "default": "claude-3-5-sonnet-20241022",  # Always use Sonnet for test quality
        "simple": "claude-3-5-haiku-20241022",     # Simple CRUD test generation
    },
    "pm_assistant": {
        "default": "claude-3-5-haiku-20241022",
        "complex": "claude-3-5-sonnet-20241022",
        "complexity_keywords": ["strategy", "architecture", "roadmap", "competitive"],
    },
}

budget.py — Core Budget Enforcement


import time
import redis
from typing import Optional
from .config import WORKFLOW_BUDGETS, TOTAL_MONTHLY_BUDGET_USD

COST_TABLE = {
    "claude-3-5-sonnet-20241022": {"input": 3.00/1e6, "output": 15.00/1e6},
    "claude-3-5-haiku-20241022":  {"input": 0.80/1e6, "output": 4.00/1e6},
    "claude-3-haiku-20240307":    {"input": 0.25/1e6, "output": 1.25/1e6},
}

class BudgetEnforcer:
    def __init__(self, redis_url: str = "redis://localhost:6379"):
        self.redis = redis.from_url(redis_url)

    def calculate_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
        costs = COST_TABLE.get(model, {"input": 0.003/1e6, "output": 0.015/1e6})
        return input_tokens * costs["input"] + output_tokens * costs["output"]

    def _key(self, workflow_id: str, period: str) -> str:
        from datetime import datetime
        if period == "daily":
            date = datetime.utcnow().strftime("%Y-%m-%d")
            return f"acme:budget:{workflow_id}:daily:{date}"
        elif period == "monthly":
            month = datetime.utcnow().strftime("%Y-%m")
            return f"acme:budget:{workflow_id}:monthly:{month}"
        elif period == "total_monthly":
            month = datetime.utcnow().strftime("%Y-%m")
            return f"acme:budget:total:monthly:{month}"

    def check_and_record(
        self,
        workflow_id: str,
        model: str,
        input_tokens: int,
        output_tokens: int,
    ) -> dict:
        """
        Check budget, record spend if within limits.
        Returns dict with allowed status and current spend info.
        """
        cost = self.calculate_cost(model, input_tokens, output_tokens)
        budget = WORKFLOW_BUDGETS.get(workflow_id)

        if not budget:
            return {"allowed": True, "cost": cost, "reason": "no_budget_configured"}

        # Effective limits with reserve buffer
        effective_monthly = budget.monthly_usd * (1 - 0)   # Full limit
        effective_daily = budget.daily_usd

        daily_key = self._key(workflow_id, "daily")
        monthly_key = self._key(workflow_id, "monthly")
        total_key = self._key(workflow_id, "total_monthly")

        # Read current values
        current_daily = float(self.redis.get(daily_key) or 0)
        current_monthly = float(self.redis.get(monthly_key) or 0)
        total_monthly = float(self.redis.get(total_key) or 0)

        # Check total organization budget first
        if total_monthly + cost > TOTAL_MONTHLY_BUDGET_USD:
            return {
                "allowed": False,
                "reason": "total_org_budget_exceeded",
                "total_monthly": total_monthly,
                "org_limit": TOTAL_MONTHLY_BUDGET_USD
            }

        # Check daily limit
        if current_daily + cost > effective_daily:
            return {
                "allowed": False,
                "reason": "daily_limit_exceeded",
                "workflow": workflow_id,
                "daily_spend": current_daily,
                "daily_limit": effective_daily
            }

        # Check monthly limit
        if current_monthly + cost > effective_monthly:
            return {
                "allowed": False,
                "reason": "monthly_limit_exceeded",
                "workflow": workflow_id,
                "monthly_spend": current_monthly,
                "monthly_limit": effective_monthly
            }

        # Record the spend
        pipe = self.redis.pipeline()
        pipe.incrbyfloat(daily_key, cost)
        pipe.expire(daily_key, 86400 * 2)
        pipe.incrbyfloat(monthly_key, cost)
        pipe.expire(monthly_key, 86400 * 35)
        pipe.incrbyfloat(total_key, cost)
        pipe.expire(total_key, 86400 * 35)
        pipe.execute()

        return {
            "allowed": True,
            "cost": cost,
            "daily_spend_after": current_daily + cost,
            "monthly_spend_after": current_monthly + cost,
            "daily_pct": ((current_daily + cost) / effective_daily) * 100,
            "monthly_pct": ((current_monthly + cost) / effective_monthly) * 100,
        }

    def get_all_status(self) -> dict:
        """Return current budget status for all workflows."""
        from datetime import datetime
        month = datetime.utcnow().strftime("%Y-%m")
        date = datetime.utcnow().strftime("%Y-%m-%d")

        status = {}
        for workflow_id, budget in WORKFLOW_BUDGETS.items():
            daily_key = self._key(workflow_id, "daily")
            monthly_key = self._key(workflow_id, "monthly")

            daily_spend = float(self.redis.get(daily_key) or 0)
            monthly_spend = float(self.redis.get(monthly_key) or 0)

            status[workflow_id] = {
                "daily_spend": daily_spend,
                "daily_limit": budget.daily_usd,
                "daily_pct": (daily_spend / budget.daily_usd) * 100,
                "monthly_spend": monthly_spend,
                "monthly_limit": budget.monthly_usd,
                "monthly_pct": (monthly_spend / budget.monthly_usd) * 100,
                "health": (
                    "CRITICAL" if daily_spend >= budget.daily_usd else
                    "WARNING" if daily_spend >= budget.daily_usd * 0.8 else
                    "OK"
                )
            }

        return status

Tip: Use atomic Redis pipeline operations for all budget increment-and-check operations. Non-atomic operations create race conditions where two simultaneous requests can both pass a budget check and both record spend, putting you over limit. The pipeline ensure correctness under concurrent load.

Phase 3: Implement the Model Router


from .config import MODEL_ROUTING

class TaskRouter:
    """Routes tasks to appropriate models based on complexity signals."""

    def select_model(self, workflow_id: str, prompt: str, context: dict = None) -> str:
        routing = MODEL_ROUTING.get(workflow_id, {})
        default_model = routing.get("default", "claude-3-5-haiku-20241022")
        complex_model = routing.get("complex", "claude-3-5-sonnet-20241022")

        # Check token threshold (large input = complex task)
        threshold = routing.get("threshold_tokens")
        if threshold:
            estimated_tokens = len(prompt.split()) * 1.3
            if estimated_tokens > threshold:
                return complex_model

        # Check complexity keywords
        keywords = routing.get("complexity_keywords", [])
        prompt_lower = prompt.lower()
        if any(kw in prompt_lower for kw in keywords):
            return complex_model

        # Check explicit context signals
        if context:
            if context.get("is_complex"):
                return complex_model
            if context.get("is_simple"):
                return routing.get("simple", default_model)

        return default_model

    def get_max_tokens(self, workflow_id: str, model: str) -> int:
        """Return appropriate max_tokens for workflow + model combination."""
        MAX_TOKENS = {
            "ci_cd_review": {"haiku": 1024, "sonnet": 2048},
            "dev_assistant": {"haiku": 2048, "sonnet": 4096},
            "qa_test_gen": {"haiku": 3000, "sonnet": 6000},
            "pm_assistant": {"haiku": 1500, "sonnet": 3000},
        }

        model_tier = "sonnet" if "sonnet" in model else "haiku"
        return MAX_TOKENS.get(workflow_id, {}).get(model_tier, 2048)

Phase 4: Wire Up the Main Client


import anthropic
import time
import uuid
from datetime import datetime, timezone
from typing import Optional, List, Dict
from .budget import BudgetEnforcer
from .router import TaskRouter
from .monitor import UsageLogger
from .alerts import AlertManager

class AcmeBudgetedClient:
    """
    Main entry point for all Acme AI interactions.
    Handles model routing, budget enforcement, and usage logging.
    """

    def __init__(
        self,
        redis_url: str = "redis://localhost:6379",
        slack_webhook: Optional[str] = None,
    ):
        self.anthropic = anthropic.Anthropic()
        self.enforcer = BudgetEnforcer(redis_url)
        self.router = TaskRouter()
        self.logger = UsageLogger(redis_url)
        self.alerts = AlertManager(slack_webhook) if slack_webhook else None

    def complete(
        self,
        workflow_id: str,
        messages: List[Dict],
        system: str = "",
        context: dict = None,
        session_id: Optional[str] = None,
        user_id: Optional[str] = None,
        loop_step: Optional[int] = None,
    ) -> dict:
        """
        Main completion method with full budget management.
        Returns dict with content, model, usage, and budget status.
        """
        # Select model
        last_user_msg = next(
            (m["content"] for m in reversed(messages) if m["role"] == "user"),
            ""
        )
        model = self.router.select_model(workflow_id, last_user_msg, context)
        max_tokens = self.router.get_max_tokens(workflow_id, model)

        # Pre-flight: estimate cost and check budget
        # Use rough estimate: input = sum of message lengths × 1.3
        est_input = sum(len(str(m.get("content", ""))) for m in messages) // 4
        est_input += len(system) // 4
        pre_check = self.enforcer.check_and_record(
            workflow_id, model, est_input, max_tokens // 4
        )

        if not pre_check["allowed"]:
            return {
                "content": (
                    f"I'm unable to process this request — the {workflow_id} workflow "
                    f"has reached its budget limit ({pre_check['reason']}). "
                    "Please contact your team lead or try again tomorrow."
                ),
                "model": model,
                "budget_blocked": True,
                "budget_reason": pre_check["reason"]
            }

        # Make the API call
        start_time = time.time()

        kwargs = {
            "model": model,
            "max_tokens": max_tokens,
            "messages": messages,
        }
        if system:
            kwargs["system"] = system

        response = self.anthropic.messages.create(**kwargs)
        latency_ms = int((time.time() - start_time) * 1000)

        # Reconcile actual cost vs. estimate
        actual_input = response.usage.input_tokens
        actual_output = response.usage.output_tokens

        # Correct the ledger: add difference between actual and estimate
        estimate_input = est_input
        estimate_output = max_tokens // 4

        correction_input = actual_input - estimate_input
        correction_output = actual_output - estimate_output

        if correction_input != 0 or correction_output != 0:
            self.enforcer.check_and_record(
                workflow_id, model, 
                max(0, correction_input), 
                max(0, correction_output)
            )

        # Log the event
        self.logger.log(
            workflow_id=workflow_id,
            session_id=session_id or str(uuid.uuid4()),
            user_id=user_id or "unknown",
            model=model,
            input_tokens=actual_input,
            output_tokens=actual_output,
            latency_ms=latency_ms,
            loop_step=loop_step,
        )

        # Check alert thresholds
        if self.alerts:
            status = self.enforcer.get_all_status()
            wf_status = status.get(workflow_id, {})

            if wf_status.get("daily_pct", 0) >= 80:
                self.alerts.warn(
                    title=f"Budget Warning: {workflow_id}",
                    message=f"Daily budget {wf_status['daily_pct']:.1f}% consumed",
                    context=wf_status
                )

        return {
            "content": response.content[0].text,
            "model": model,
            "input_tokens": actual_input,
            "output_tokens": actual_output,
            "latency_ms": latency_ms,
            "budget_blocked": False,
        }

Tip: Return budget status information in every API response, even when the request succeeds. Downstream code can use this to display budget indicators in developer tools, log budget health metrics, or trigger graceful context compression before hitting hard limits.

Phase 5: Build the Monthly Budget Report


from .budget import BudgetEnforcer
from .config import WORKFLOW_BUDGETS, TOTAL_MONTHLY_BUDGET_USD
from datetime import datetime

def generate_monthly_report(enforcer: BudgetEnforcer) -> str:
    status = enforcer.get_all_status()
    month = datetime.utcnow().strftime("%B %Y")

    total_spend = sum(s["monthly_spend"] for s in status.values())
    total_budget = TOTAL_MONTHLY_BUDGET_USD

    report_lines = [
        f"ACME ENGINEERING — AI Budget Report ({month})",
        "=" * 60,
        f"",
        f"Organization Summary:",
        f"  Total spend:    ${total_spend:.2f} of ${total_budget:.2f}",
        f"  Budget used:    {(total_spend/total_budget)*100:.1f}%",
        f"  Remaining:      ${total_budget - total_spend:.2f}",
        f"",
        f"Workflow Breakdown:",
    ]

    for workflow_id, wf_status in status.items():
        health_icon = {"OK": "[OK]", "WARNING": "[WARN]", "CRITICAL": "[CRIT]"}[
            wf_status["health"]
        ]
        report_lines.extend([
            f"",
            f"  {health_icon} {workflow_id}",
            f"    Monthly:  ${wf_status['monthly_spend']:.4f} / ${wf_status['monthly_limit']:.2f}"
            f"  ({wf_status['monthly_pct']:.1f}%)",
            f"    Daily:    ${wf_status['daily_spend']:.4f} / ${wf_status['daily_limit']:.2f}"
            f"  ({wf_status['daily_pct']:.1f}%)",
        ])

    # Estimated ROI summary
    report_lines.extend([
        f"",
        "=" * 60,
        f"ROI Summary (estimated):",
        f"  Engineer time saved:  ~180 hrs/month × $90/hr = $16,200",
        f"  QA time saved:        ~90 hrs/month × $70/hr  = $6,300",
        f"  PM time saved:        ~40 hrs/month × $100/hr = $4,000",
        f"  Total value created:  $26,500/month",
        f"  Total AI investment:  ${total_spend:.2f}/month",
        f"  Net ROI:              {((26500 - total_spend) / max(total_spend, 0.01)) * 100:.0f}%",
    ])

    return "\n".join(report_lines)


if __name__ == "__main__":
    enforcer = BudgetEnforcer()
    print(generate_monthly_report(enforcer))

Tip: Schedule this report to run on the 1st of each month (via cron or GitHub Actions) and post it to your team Slack channel automatically. The act of publishing the report publicly — even just to your team — creates accountability and naturally drives optimization behavior without any top-down mandate.

Phase 6: Integration Test Your Framework

Before deploying to production, run integration tests against all four workflows:


import pytest
from unittest.mock import MagicMock, patch
from acme_ai_budget.budget import BudgetEnforcer
from acme_ai_budget.router import TaskRouter
from acme_ai_budget.config import WORKFLOW_BUDGETS

def test_router_selects_haiku_for_simple_task():
    router = TaskRouter()
    model = router.select_model("dev_assistant", "extract the function name from this code")
    assert "haiku" in model

def test_router_escalates_to_sonnet_for_architecture_task():
    router = TaskRouter()
    model = router.select_model("dev_assistant", "design the architecture for our microservices")
    assert "sonnet" in model

def test_budget_allows_request_within_limits(mock_redis):
    enforcer = BudgetEnforcer()
    result = enforcer.check_and_record(
        "ci_cd_review",
        "claude-3-5-haiku-20241022",
        input_tokens=1000,
        output_tokens=500
    )
    assert result["allowed"] is True

def test_budget_blocks_request_when_daily_limit_exceeded(mock_redis):
    enforcer = BudgetEnforcer()
    budget = WORKFLOW_BUDGETS["ci_cd_review"]

    # Simulate that daily spend is already at limit
    mock_redis.get.return_value = str(budget.daily_usd).encode()

    result = enforcer.check_and_record(
        "ci_cd_review",
        "claude-3-5-haiku-20241022",
        input_tokens=1000,
        output_tokens=500
    )
    assert result["allowed"] is False
    assert result["reason"] == "daily_limit_exceeded"

def test_full_workflow_integration():
    """End-to-end test with mocked Anthropic API."""
    with patch("anthropic.Anthropic") as mock_anthropic:
        mock_response = MagicMock()
        mock_response.content[0].text = "Test output"
        mock_response.usage.input_tokens = 500
        mock_response.usage.output_tokens = 200
        mock_response.stop_reason = "end_turn"
        mock_anthropic.return_value.messages.create.return_value = mock_response

        from acme_ai_budget.client import AcmeBudgetedClient
        client = AcmeBudgetedClient(redis_url="redis://localhost:6379")

        result = client.complete(
            workflow_id="ci_cd_review",
            messages=[{"role": "user", "content": "Review this code: def foo(): pass"}],
            system="You are a code reviewer.",
            user_id="engineer_001"
        )

        assert result["budget_blocked"] is False
        assert result["content"] == "Test output"
        assert result["input_tokens"] == 500

Tip: Add these integration tests to your CI pipeline and run them on every code change to the budget framework. A broken budget guardrail that silently allows unlimited spending is worse than no guardrail — the broken state gives false confidence. CI tests prevent regressions in your financial controls.

Phase 7: Deploy and Validate

Once the framework is implemented and tested:

Week 1: Deploy to one workflow (start with CI/CD review — automated, easy to measure)
- Enable all logging
- Set alerts to INFO level (no paging yet)
- Compare actual costs to estimate

Week 2: Deploy to all workflows
- Enable WARNING alerts to Slack
- Run baseline ROI measurement

Week 3: Enable CRITICAL alerts with PagerDuty
- Adjust budget limits based on real data
- Tune model routing thresholds

Week 4: Generate first monthly report
- Share ROI analysis with team lead
- Identify top optimization opportunities

Month 2+: Run the monthly ROI review
- Adjust budgets based on actual consumption
- Expand to new workflows as team adopts AI tooling

Tip: When you deploy the budget framework, announce it to your team with context: "This framework gives us visibility into how we're using our AI budget and makes sure no one accidentally runs up a big bill. It's not here to limit your use of AI — it's here to help us use it more." The framing matters enormously for adoption.

Summary

This hands-on lab walked through building a production-grade token budget framework from inventory through implementation to deployment. The complete framework includes baseline cost estimation, Redis-backed budget enforcement at daily and monthly granularities, model routing that balances quality and cost across all four workflow types, integration tests that protect your financial controls from regressions, and automated monthly reporting that keeps ROI visible to the whole team. The framework is intentionally modular — you can adopt any one component independently and add the others as your team matures. The goal is not a perfect system on day one, but a system that gets measurably better each month.