You built an AI agent that works. Now you need to charge for it. The pricing model you choose determines your revenue ceiling, your churn rate, and whether customers feel like they're getting a fair deal. Most builders default to whatever's easiest to implement — and leave money on the table for years. This guide breaks down the three dominant pricing models, when each one wins, and how to implement them with working code.
The pricing problem for AI agent builders
AI agents aren't SaaS. They don't consume fixed resources per user per month. An agent that drafts emails might use 200 tokens on a simple reply and 15,000 tokens on a complex thread. An agent that qualifies leads might run 50 times and succeed twice. Charging a flat monthly fee for either of these ignores the economics entirely.
The core tension: customers want to pay for value, not consumption. But value is hard to measure in real time, while consumption (tokens, API calls) is trivially measurable. Every pricing model is a different compromise between these two forces.
Three models have emerged as viable for production AI agents. Each optimizes for a different variable.
Model 1: Per-token pricing
How it works: You meter input and output tokens on every agent execution. The customer pays a rate per token — typically per 1,000 tokens (1K) or per million tokens (1M). This mirrors how OpenAI and Anthropic price their APIs.
Example: You charge $0.003 per 1K tokens. An agent run consuming 2,000 input + 800 output tokens (2,800 total) costs $0.0084.
// Per-token pricing: meter tokens and compute price
const response = await fetch('https://rev.polsia.app/v1/meter', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
agent_id: 'research-agent-v1',
action: 'research.complete',
tokens_input: 3200,
tokens_output: 1100,
// No outcome field — pure token-based
metadata: { query: 'competitor analysis Q2' }
}),
});
const { price_charged } = await response.json();
// price_charged computed from your per-token rate config
When per-token works
- Internal tooling. Your own team uses the agent — you want cost transparency, not margin optimization.
- Developer-facing APIs. Your customers are technical and understand tokens. They'll audit their own usage patterns.
- Predictable workloads. Every call consumes roughly the same number of tokens (e.g., a classification agent).
When per-token fails
- Non-technical buyers. "You used 847,000 tokens this month" means nothing to a sales VP. They'll churn because they can't connect cost to value.
- Efficiency is punished. If you optimize your agent to use fewer tokens, your revenue drops — even though the customer is getting the same value. This creates a perverse incentive to not improve your product.
- Variable task complexity. A simple lookup costs pennies; a deep research task costs dollars. The customer experiences the same product but pays 100x more for hard queries. That feels unfair, even when it's technically accurate.
Model 2: Per-call pricing
How it works: Every agent execution is one "call." You charge a flat rate per call regardless of token consumption, execution time, or outcome. API platforms like Twilio and Clearbit popularized this model.
Example: You charge $0.05 per call. Ten calls cost $0.50 whether each call used 500 tokens or 50,000.
// Per-call pricing: flat fee per execution
const response = await fetch('https://rev.polsia.app/v1/meter', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
agent_id: 'classify-agent',
action: 'ticket.classified',
tokens_input: 600, // Still tracked for analytics
tokens_output: 50, // but price is per-call flat rate
metadata: { ticket_id: 'TK-4821' }
}),
});
const { price_charged } = await response.json();
// price_charged = your flat per-call rate (e.g., $0.05)
When per-call works
- Simple, uniform operations. Classification, tagging, routing — where every call does roughly the same thing.
- Non-technical buyers. "You ran 500 agent calls this month at $0.05 each = $25" is a sentence anyone can understand.
- High volume, low variance. When token consumption per call is consistent, per-call pricing is effectively per-token pricing with better packaging.
When per-call fails
- Value variance is high. A call that books a $50,000 meeting and a call that finds nothing both cost $0.05. You're underpricing your wins and overpricing your misses.
- Heavy calls subsidize light ones. If some calls use 10x the compute, you're either overcharging simple calls (losing volume) or undercharging complex ones (losing margin).
- No outcome alignment. The customer pays the same whether the agent succeeded or failed. At scale, this creates trust erosion — "I paid for 10,000 calls and only 200 actually worked."
Model 3: Outcome-based pricing
How it works: You charge when the agent delivers a measurable result. A meeting booked, a lead qualified, a support ticket resolved, an order placed. The customer pays for outcomes, not activity. Failed or inconclusive runs cost nothing (or a minimal base fee).
Example: You charge $0.01 base per call + $5.00 per verified meeting booked. At 1,000 calls with 15 successful bookings, the customer pays $10 base + $75 outcome fees = $85 total.
// Step 1: Agent sends an outreach email — outcome unknown yet
const { outcome_id } = await fetch('https://rev.polsia.app/v1/meter', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
agent_id: 'outreach-agent-v2',
action: 'email.sent',
tokens_input: 1400,
tokens_output: 380,
outcome: 'pending', // Charged base fee only
expires_at: new Date(Date.now() + 7 * 86400000).toISOString(),
metadata: { prospect: 'vp-eng@acme.com' }
}),
}).then(r => r.json());
// Step 2: Three days later, prospect replies
await fetch(`https://rev.polsia.app/v1/outcomes/${outcome_id}/resolve`, {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
outcome: 'success',
metadata: { reply_type: 'positive', meeting_booked: true }
}),
});
// NOW the outcome bonus ($5.00) is charged
When outcome-based works
- Revenue-generating agents. Sales outreach, lead qualification, appointment setting — where each success has a clear dollar value to the customer.
- Customer-facing automation. Support resolution, order processing, content moderation — outcomes are binary and measurable.
- High-value, low-frequency actions. When each successful call is worth $5-$500 to the customer, they'll gladly pay an outcome fee that's a fraction of that value.
When outcome-based fails
- Outcomes are subjective. "Was this research report good?" isn't a binary. Disputes will eat your support team alive.
- Detection is unreliable. If you can't programmatically verify the outcome (e.g., "did the user find this helpful?"), you'll either over-charge or under-charge based on faulty signals.
- Cold-start problem. New agents with no track record have unknown success rates. Pricing outcome bonuses before you know your baseline is guesswork.
Comparison table: which model fits your agent?
| Factor | Per-Token | Per-Call | Outcome-Based |
|---|---|---|---|
| Simplicity | Medium (need to track tokens) | Highest (count calls) | Lowest (define + detect outcomes) |
| Revenue per call | Low, variable | Fixed | High on successes, low on failures |
| Customer transparency | Low (tokens are opaque) | High (simple count) | Highest (pay for results) |
| Incentive alignment | Misaligned (penalizes efficiency) | Neutral (no outcome link) | Aligned (succeed together) |
| Best for | Dev tools, internal agents, predictable workloads | Simple automations, high-volume uniform tasks | Sales agents, lead gen, support bots, high-value actions |
| Churn risk | High (cost spikes surprise customers) | Medium (no value signal) | Low (customers see ROI directly) |
| Implementation effort | Low | Lowest | Medium-High (outcome detection logic) |
The hybrid approach: why most production agents use all three
In practice, the best pricing isn't a single model — it's a stack. Most production agents combine a base per-call fee (covers fixed overhead), a per-token component (covers variable compute), and an outcome bonus (captures value when the agent succeeds).
This is the model Rev was built around. The pricing engine lets you configure all three components independently per pricing tier, and the /v1/meter endpoint computes the final price on every call using your config.
How to choose: a decision framework
Answer three questions about your agent:
- Can you define a discrete, measurable outcome? If yes, include an outcome component. If the outcome is subjective or undetectable, skip it.
- Does token consumption vary more than 5x between calls? If yes, include a per-token component. If consumption is uniform, a per-call flat rate is simpler and more predictable.
- Who is your buyer? Technical buyers tolerate token-based pricing. Business buyers need per-call or outcome-based invoices they can understand without a calculator.
If you answered "yes, yes, business buyer" — use the full hybrid (base + token + outcome). If you answered "no, no, developer" — per-call is fine. Every other combination falls somewhere in between. Start simple, add components as your success rate data matures.
Implementing all three models with Rev
Rev's metering API supports all three pricing models through a single endpoint. You configure your pricing tiers in the dashboard, then call /v1/meter with the relevant data — the engine computes the right price based on your config.
import requests
# Single API call — Rev applies your pricing tier rules
response = requests.post(
'https://rev.polsia.app/v1/meter',
headers={
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
json={
'agent_id': 'sales-agent-v3',
'action': 'lead.qualified',
'tokens_input': 2800,
'tokens_output': 650,
'outcome': 'success',
'metadata': {
'lead_score': 87,
'company': 'Acme Corp',
'deal_size': 45000
}
}
)
data = response.json()
# data['price_charged'] = base_fee + token_fee + outcome_bonus
# data['price_breakdown'] = { base: 0.01, tokens: 0.00035, outcome: 5.00 }
print(f"Total charged: ${data['price_charged']}")
The key insight: you don't need to pick one model. Configure your pricing tier with all three components, set any component to zero if you don't want it, and Rev handles the math. Start with per-call only, add token tracking when you have usage data, add outcome bonuses when you can measure success reliably.
What to do next
If you're building an AI agent and haven't chosen a pricing model yet:
- Read the first post in this series — The Complete Guide to Billing AI Agents covers metering infrastructure, deferred outcomes, and build vs buy in depth.
- Run the pricing simulator — the interactive calculator models your expected revenue under each pricing model. Plug in your call volume, average tokens, and expected success rate.
- Try the live demo — the homepage demo fires real API calls with real price computations. See all three pricing components in action before you commit.
- Get your API key — sign up and start metering in under 5 minutes. No sales call, no approval process.
Configure per-token, per-call, or outcome-based pricing — or all three — with a single API. Start Free — Get Your API Key →