·

Token costs are the "fuel bill" of agentic AI systems. Unlike traditional software where compute costs are relatively predictable and tied to server time, LLM-powered workflows are billed by the token — and the pricing structures across providers are surprisingly varied. Understanding those structures deeply is the prerequisite to every other cost-management decision your team makes.

This topic breaks down how each major provider prices tokens, what hidden cost multipliers exist in agentic workflows, and how to build mental models that let you estimate spend before you write a single line of code.


How Token Pricing Works: The Fundamentals

Every major LLM provider charges separately for input tokens (what you send to the model) and output tokens (what the model generates back). Output tokens are almost always more expensive — typically 3x to 5x the cost of input tokens — because generation requires sequential forward passes through the model, while reading the context is parallelized.

A token is roughly 0.75 words in English, or about 4 characters. In practice:
- A 1,000-word document ≈ 1,333 tokens
- A 100-line Python file ≈ 500–800 tokens (code is more token-dense than prose)
- A full system prompt with tool definitions ≈ 500–2,000 tokens depending on complexity

In an agentic loop where the model makes 10 tool calls, each iteration re-sends the accumulated conversation history as input. This means a 20-step agent run can consume 10x–20x more input tokens than a single-shot prompt of the same final length. This compounding effect is the core cost risk in agentic systems.

Tip: Always estimate token cost by multiplying your single-turn prompt cost by the expected number of agentic steps, then add 30% for error recovery, retries, and context expansion. This gives you a realistic "per-task budget" rather than a best-case figure.


Provider Pricing Breakdown (as of mid-2025)

Prices change frequently. Always verify at provider documentation before finalizing budgets. The table below reflects pricing tiers as of early-to-mid 2025 and is structured to make model-tier comparisons easy.

OpenAI

Model Input (per 1M tokens) Output (per 1M tokens) Notes
GPT-4o $2.50 $10.00 Flagship multimodal
GPT-4o mini $0.15 $0.60 Strong cost/quality ratio
GPT-4 Turbo $10.00 $30.00 Legacy, being superseded
GPT-3.5 Turbo $0.50 $1.50 Fast, cheap, lower reasoning
o1 $15.00 $60.00 Extended thinking, complex tasks
o1-mini $3.00 $12.00 Reasoning at reduced cost
o3-mini $1.10 $4.40 Latest efficient reasoning model

OpenAI also offers cached input tokens at 50% discount when the prefix of your prompt matches a recently used context (automatic for prompts over 1,024 tokens). This is called Prompt Caching and is highly relevant for agentic workflows with stable system prompts.

Anthropic (Claude)

Model Input (per 1M tokens) Output (per 1M tokens) Notes
Claude 3.5 Sonnet $3.00 $15.00 Best-in-class coding/reasoning
Claude 3.5 Haiku $0.80 $4.00 Fast, cheap, solid quality
Claude 3 Opus $15.00 $75.00 Highest capability, high cost
Claude 3 Haiku $0.25 $1.25 Ultra-cheap, simpler tasks

Anthropic offers Prompt Caching explicitly: cached writes cost $3.75/1M tokens, cached reads cost $0.30/1M tokens (vs $3.00 standard). For long system prompts or document contexts that persist across turns, this yields 90% savings on the cached portion. Cache TTL is 5 minutes (extendable with activity).

Google Gemini

Model Input (per 1M tokens) Output (per 1M tokens) Notes
Gemini 1.5 Pro $3.50 (≤128K), $7.00 (>128K) $10.50 (≤128K), $21.00 (>128K) Context-length tiered pricing
Gemini 1.5 Flash $0.075 (≤128K), $0.15 (>128K) $0.30 (≤128K), $0.60 (>128K) Extremely low cost
Gemini 1.0 Pro $0.50 $1.50 Legacy model
Gemini 2.0 Flash $0.10 $0.40 Latest efficient model

Google's tiered pricing by context length is unique among major providers. If your agentic workflow regularly exceeds 128K tokens in context (common with codebase analysis or long document workflows), Gemini 1.5 Pro costs double. Plan your context windowing accordingly.

Google also offers a Context Caching feature (currently in preview) that charges for cache storage at $1.00/1M tokens/hour, while cached reads are 75% cheaper than standard input rates.

AWS Bedrock

AWS Bedrock provides access to multiple model families with on-demand and provisioned throughput pricing:

Model Input (per 1M tokens) Output (per 1M tokens)
Anthropic Claude 3.5 Sonnet $3.00 $15.00
Anthropic Claude 3.5 Haiku $0.80 $4.00
Meta Llama 3.1 70B Instruct $0.72 $0.72
Meta Llama 3.1 405B Instruct $2.40 $2.40
Amazon Titan Text Premier $0.50 $1.50
Mistral Large $4.00 $12.00
AI21 Jamba 1.5 Large $2.00 $8.00

Bedrock's Provisioned Throughput model is important for production agentic systems: you reserve model units (MUs) for a committed term (1 month or 6 months), which eliminates rate limits and can reduce per-token costs by 30–60% at scale. However, you pay for the reservation whether you use it or not.

Azure OpenAI

Azure OpenAI mirrors OpenAI's models but with enterprise billing, regional availability, and commitment discounts:

Model Input (per 1M tokens) Output (per 1M tokens)
GPT-4o $2.50 $10.00
GPT-4o mini $0.165 $0.66
GPT-4 Turbo $10.00 $30.00
GPT-3.5 Turbo $0.50 $1.50

Azure adds Provisioned Throughput Units (PTUs) — a commitment-based model that provides guaranteed capacity and can drop effective per-token costs significantly for high-volume workloads. PTUs are priced per hour, not per token.

Tip: For enterprise teams on Azure with predictable agentic workloads (e.g., nightly CI pipelines, daily document processing), PTU commitments can reduce token costs by 40–60% compared to pay-as-you-go. Run a 2-week baseline first to establish actual usage patterns before committing.


The Hidden Cost Multipliers in Agentic Workflows

Standard pricing tables show per-token rates, but agentic systems have structural patterns that multiply those base costs significantly.

Tool Call Overhead

Every tool definition you include in the model's context costs tokens. A well-defined tool schema with description, parameters, and examples can consume 200–500 tokens. If you have 10 tools defined and run 100 tasks per day:

Tool schema tokens: 10 tools × 350 avg tokens = 3,500 tokens per request
Daily overhead: 3,500 × 100 requests = 350,000 tokens
Monthly: 10.5M tokens
At GPT-4o pricing ($2.50/1M input): $26.25/month just in tool definitions

This is significant but manageable — and it's why selective tool loading matters (covered in Module 7).

Context Accumulation in Multi-Step Agents

In a ReAct-style agent loop, the full conversation history is included in each request. A 20-step agent that builds up 500 tokens per step:

Step 1 input: 1,000 tokens (system + initial query)
Step 2 input: 1,500 tokens
Step 3 input: 2,000 tokens
...
Step 20 input: 10,500 tokens

Total input tokens: sum(1000 + 500n for n in 0..19) = 1000×20 + 500×(0+1+...+19)
= 20,000 + 500×190 = 20,000 + 95,000 = 115,000 tokens

vs. single-shot equivalent: ~1,500 tokens
Multiplier: ~77x

This is the core reason why agentic cost modeling must account for loop length, not just prompt length.

Retry and Error Recovery

Production agentic systems retry failed tool calls, handle API errors, and sometimes re-prompt after bad outputs. A 15% retry rate adds 15% to your token budget. Build this into estimates.

Streaming vs. Batch

Streaming (receiving tokens as they generate) has identical token costs to non-streaming, but affects your application architecture. Batch APIs (where available) can offer 50% discounts: OpenAI's Batch API delivers results within 24 hours at half price, useful for non-real-time workflows like nightly report generation.

Tip: Audit your agent loop for "unnecessary verbosity" — places where the model is instructed to explain its reasoning at length before acting. Shifting from chain-of-thought prompting to direct action prompting can reduce output tokens by 40–60% with minimal quality loss for well-defined tasks.


Pricing Models: Per-Token vs. Subscription vs. API Credits

Beyond per-token pricing, providers offer alternative billing structures relevant to teams:

OpenAI API Credits

OpenAI sells prepaid credits. Buying in bulk does not currently yield discounts, but credits don't expire, making them useful for budget allocation across teams.

Anthropic API Credits

Similar model — prepaid, no expiry, no bulk discount at standard tiers. Enterprise agreements with Anthropic can unlock volume pricing.

Google AI Studio (Free Tier)

Gemini models via AI Studio offer a free tier with rate limits suitable for development and prototyping. Production use requires billing enablement via Google Cloud.

AWS Bedrock Free Tier

Bedrock offers limited free monthly quota for select models. This is useful for evaluation but insufficient for any real agentic workload.

Azure OpenAI Enterprise Agreements

For large enterprises, Microsoft offers custom pricing through EA agreements, often bundling Azure credits that can offset OpenAI model costs.

Tip: If your team runs a mix of development, staging, and production workloads, separate them into different API keys or projects from day one. This makes it trivial to attribute costs and avoid situations where a runaway development experiment burns through a production budget.


Building a Cost Estimation Model

Before deploying any agentic workflow, build a cost estimate using this template:

AGENTIC WORKFLOW COST ESTIMATION

Task definition: [describe the task]
Model selected: [model name]

Per-task estimate:
  System prompt tokens:      _______
  Tool definitions (total):  _______
  Average user input:        _______
  Expected output tokens:    _______
  Expected agentic steps:    _______

Per-step input cost:
  (system + tools + accumulated history) × input_price/1M

Per-step output cost:
  (avg output per step) × output_price/1M

Total per task = sum over all steps + retry buffer (15%)

Scale:
  Tasks per day:             _______
  Days per month:            _______
  Monthly token estimate:    _______
  Monthly cost estimate:     $ _______

Comparison point:
  Same task with cheaper model: $ _______
  Time saved vs. manual:     _______ hours
  Engineer hourly rate:      $ _______
  Manual cost:               $ _______
  ROI break-even:            _______

Running this calculation before every new agentic workflow prevents "cost surprises" that undermine confidence in AI tooling.

Tip: Build a shared cost estimation spreadsheet or Notion template that your whole team uses before any new agentic feature goes into production. Standardizing the estimation process catches 80% of budget issues at design time, not after the invoice arrives.


Summary

Token pricing across providers follows the same basic structure — pay for input, pay more for output — but the details diverge significantly: context-length tiers (Gemini), prompt caching (OpenAI, Anthropic), provisioned throughput (Bedrock, Azure), and batch discounts (OpenAI) all create optimization opportunities. The biggest cost risk in agentic systems is not the base model price but the multiplicative effect of multi-step loops, tool overhead, and retry logic. Accurate cost management starts with a disciplined estimation model applied before any workflow reaches production.