AI Agent Pricing Models Compared — Per-Token vs Per-Call vs Outcome-Based

← All posts

You built an AI agent that works. Now you need to charge for it. The pricing model you choose determines your revenue ceiling, your churn rate, and whether customers feel like they're getting a fair deal. Most builders default to whatever's easiest to implement — and leave money on the table for years. This guide breaks down the three dominant pricing models, when each one wins, and how to implement them with working code.

The pricing problem for AI agent builders

AI agents aren't SaaS. They don't consume fixed resources per user per month. An agent that drafts emails might use 200 tokens on a simple reply and 15,000 tokens on a complex thread. An agent that qualifies leads might run 50 times and succeed twice. Charging a flat monthly fee for either of these ignores the economics entirely.

The core tension: customers want to pay for value, not consumption. But value is hard to measure in real time, while consumption (tokens, API calls) is trivially measurable. Every pricing model is a different compromise between these two forces.

Three models have emerged as viable for production AI agents. Each optimizes for a different variable.

Model 1: Per-token pricing

How it works: You meter input and output tokens on every agent execution. The customer pays a rate per token — typically per 1,000 tokens (1K) or per million tokens (1M). This mirrors how OpenAI and Anthropic price their APIs.

Example: You charge $0.003 per 1K tokens. An agent run consuming 2,000 input + 800 output tokens (2,800 total) costs $0.0084.

Per-token pricing javascript

// Per-token pricing: meter tokens and compute price
const response = await fetch('https://rev.polsia.app/v1/meter', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    agent_id: 'research-agent-v1',
    action: 'research.complete',
    tokens_input: 3200,
    tokens_output: 1100,
    // No outcome field — pure token-based
    metadata: { query: 'competitor analysis Q2' }
  }),
});

const { price_charged } = await response.json();
// price_charged computed from your per-token rate config

When per-token works

Internal tooling. Your own team uses the agent — you want cost transparency, not margin optimization.
Developer-facing APIs. Your customers are technical and understand tokens. They'll audit their own usage patterns.
Predictable workloads. Every call consumes roughly the same number of tokens (e.g., a classification agent).

When per-token fails

Non-technical buyers. "You used 847,000 tokens this month" means nothing to a sales VP. They'll churn because they can't connect cost to value.
Efficiency is punished. If you optimize your agent to use fewer tokens, your revenue drops — even though the customer is getting the same value. This creates a perverse incentive to not improve your product.
Variable task complexity. A simple lookup costs pennies; a deep research task costs dollars. The customer experiences the same product but pays 100x more for hard queries. That feels unfair, even when it's technically accurate.

Bottom line: Per-token pricing is the easiest to implement and the hardest to sell. It's a cost-plus model masquerading as usage-based pricing. If your customer sees tokens on their invoice, you're exposing infrastructure details they shouldn't need to think about.

Model 2: Per-call pricing

How it works: Every agent execution is one "call." You charge a flat rate per call regardless of token consumption, execution time, or outcome. API platforms like Twilio and Clearbit popularized this model.

Example: You charge $0.05 per call. Ten calls cost $0.50 whether each call used 500 tokens or 50,000.

Per-call pricing javascript

// Per-call pricing: flat fee per execution
const response = await fetch('https://rev.polsia.app/v1/meter', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    agent_id: 'classify-agent',
    action: 'ticket.classified',
    tokens_input: 600,   // Still tracked for analytics
    tokens_output: 50,   // but price is per-call flat rate
    metadata: { ticket_id: 'TK-4821' }
  }),
});

const { price_charged } = await response.json();
// price_charged = your flat per-call rate (e.g., $0.05)

When per-call works

Simple, uniform operations. Classification, tagging, routing — where every call does roughly the same thing.
Non-technical buyers. "You ran 500 agent calls this month at $0.05 each = $25" is a sentence anyone can understand.
High volume, low variance. When token consumption per call is consistent, per-call pricing is effectively per-token pricing with better packaging.

When per-call fails

Value variance is high. A call that books a $50,000 meeting and a call that finds nothing both cost $0.05. You're underpricing your wins and overpricing your misses.
Heavy calls subsidize light ones. If some calls use 10x the compute, you're either overcharging simple calls (losing volume) or undercharging complex ones (losing margin).
No outcome alignment. The customer pays the same whether the agent succeeded or failed. At scale, this creates trust erosion — "I paid for 10,000 calls and only 200 actually worked."

Bottom line: Per-call pricing is the SaaS instinct — simple, predictable, easy to invoice. But it doesn't reflect value. It works when every call is worth roughly the same. The moment that's not true, you're mispricing every transaction.

Model 3: Outcome-based pricing

How it works: You charge when the agent delivers a measurable result. A meeting booked, a lead qualified, a support ticket resolved, an order placed. The customer pays for outcomes, not activity. Failed or inconclusive runs cost nothing (or a minimal base fee).

Example: You charge $0.01 base per call + $5.00 per verified meeting booked. At 1,000 calls with 15 successful bookings, the customer pays $10 base + $75 outcome fees = $85 total.

Outcome-based pricing with deferred resolution javascript

// Step 1: Agent sends an outreach email — outcome unknown yet
const { outcome_id } = await fetch('https://rev.polsia.app/v1/meter', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    agent_id: 'outreach-agent-v2',
    action: 'email.sent',
    tokens_input: 1400,
    tokens_output: 380,
    outcome: 'pending',  // Charged base fee only
    expires_at: new Date(Date.now() + 7 * 86400000).toISOString(),
    metadata: { prospect: 'vp-eng@acme.com' }
  }),
}).then(r => r.json());

// Step 2: Three days later, prospect replies
await fetch(`https://rev.polsia.app/v1/outcomes/${outcome_id}/resolve`, {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    outcome: 'success',
    metadata: { reply_type: 'positive', meeting_booked: true }
  }),
});
// NOW the outcome bonus ($5.00) is charged

When outcome-based works

Revenue-generating agents. Sales outreach, lead qualification, appointment setting — where each success has a clear dollar value to the customer.
Customer-facing automation. Support resolution, order processing, content moderation — outcomes are binary and measurable.
High-value, low-frequency actions. When each successful call is worth $5-$500 to the customer, they'll gladly pay an outcome fee that's a fraction of that value.

When outcome-based fails

Outcomes are subjective. "Was this research report good?" isn't a binary. Disputes will eat your support team alive.
Detection is unreliable. If you can't programmatically verify the outcome (e.g., "did the user find this helpful?"), you'll either over-charge or under-charge based on faulty signals.
Cold-start problem. New agents with no track record have unknown success rates. Pricing outcome bonuses before you know your baseline is guesswork.

Bottom line: Outcome-based pricing aligns incentives perfectly — when you can define and detect outcomes reliably. It's the highest-margin model and the hardest to implement correctly. Rev's deferred outcome API handles the detection and resolution workflow, but you still need to define what "success" means for your specific agent.

Comparison table: which model fits your agent?

Factor	Per-Token	Per-Call	Outcome-Based
Simplicity	Medium (need to track tokens)	Highest (count calls)	Lowest (define + detect outcomes)
Revenue per call	Low, variable	Fixed	High on successes, low on failures
Customer transparency	Low (tokens are opaque)	High (simple count)	Highest (pay for results)
Incentive alignment	Misaligned (penalizes efficiency)	Neutral (no outcome link)	Aligned (succeed together)
Best for	Dev tools, internal agents, predictable workloads	Simple automations, high-volume uniform tasks	Sales agents, lead gen, support bots, high-value actions
Churn risk	High (cost spikes surprise customers)	Medium (no value signal)	Low (customers see ROI directly)
Implementation effort	Low	Lowest	Medium-High (outcome detection logic)

The hybrid approach: why most production agents use all three

In practice, the best pricing isn't a single model — it's a stack. Most production agents combine a base per-call fee (covers fixed overhead), a per-token component (covers variable compute), and an outcome bonus (captures value when the agent succeeds).

Real-world example: An AI SDR agent charges $0.01/call base + $0.0001/token + $3.00 per qualified meeting booked. At 5,000 calls/month averaging 2,500 tokens each, with a 2% meeting rate (100 meetings), the customer pays: $50 base + $1.25 tokens + $300 outcomes = $351.25/month. That's $3.51 per meeting — a fraction of what a human SDR costs. The customer sees clear ROI, you cover your costs on every call, and your revenue scales with the customer's success.

This is the model Rev was built around. The pricing engine lets you configure all three components independently per pricing tier, and the /v1/meter endpoint computes the final price on every call using your config.

How to choose: a decision framework

Answer three questions about your agent:

Can you define a discrete, measurable outcome? If yes, include an outcome component. If the outcome is subjective or undetectable, skip it.
Does token consumption vary more than 5x between calls? If yes, include a per-token component. If consumption is uniform, a per-call flat rate is simpler and more predictable.
Who is your buyer? Technical buyers tolerate token-based pricing. Business buyers need per-call or outcome-based invoices they can understand without a calculator.

If you answered "yes, yes, business buyer" — use the full hybrid (base + token + outcome). If you answered "no, no, developer" — per-call is fine. Every other combination falls somewhere in between. Start simple, add components as your success rate data matures.

Implementing all three models with Rev

Rev's metering API supports all three pricing models through a single endpoint. You configure your pricing tiers in the dashboard, then call /v1/meter with the relevant data — the engine computes the right price based on your config.

Full hybrid metering call python

import requests

# Single API call — Rev applies your pricing tier rules
response = requests.post(
    'https://rev.polsia.app/v1/meter',
    headers={
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json',
    },
    json={
        'agent_id': 'sales-agent-v3',
        'action': 'lead.qualified',
        'tokens_input': 2800,
        'tokens_output': 650,
        'outcome': 'success',
        'metadata': {
            'lead_score': 87,
            'company': 'Acme Corp',
            'deal_size': 45000
        }
    }
)

data = response.json()
# data['price_charged'] = base_fee + token_fee + outcome_bonus
# data['price_breakdown'] = { base: 0.01, tokens: 0.00035, outcome: 5.00 }
print(f"Total charged: ${data['price_charged']}")

The key insight: you don't need to pick one model. Configure your pricing tier with all three components, set any component to zero if you don't want it, and Rev handles the math. Start with per-call only, add token tracking when you have usage data, add outcome bonuses when you can measure success reliably.

What to do next

If you're building an AI agent and haven't chosen a pricing model yet:

Read the first post in this series — The Complete Guide to Billing AI Agents covers metering infrastructure, deferred outcomes, and build vs buy in depth.
Run the pricing simulator — the interactive calculator models your expected revenue under each pricing model. Plug in your call volume, average tokens, and expected success rate.
Try the live demo — the homepage demo fires real API calls with real price computations. See all three pricing components in action before you commit.
Get your API key — sign up and start metering in under 5 minutes. No sales call, no approval process.

Ready to price your AI agent?
Configure per-token, per-call, or outcome-based pricing — or all three — with a single API. Start Free — Get Your API Key →