LLM API Cost Calculator
Estimate API cost, compare models, find your breakeven volume, and see which cloud platforms host each model. Covers 110+ frontier LLMs from OpenAI, Anthropic, Google, Meta, Mistral, xAI, NVIDIA, plus Chinese providers (DeepSeek, Qwen, GLM, Kimi, Yi, Hunyuan, Baichuan).
Pick a model + workload, then hit Calculate to see daily, monthly, and annual costs.
How to use
Pick the provider first — that filters the Model dropdown to only that lab's offerings. Estimate your average input tokens (the prompt) and average output tokens (the response) per request. Set your daily request volume. Optionally tick prompt-caching if you reuse large system prompts, or Batch API if you can tolerate ~24h latency for ~50% off. The result panel shows the cost per request, daily/monthly/annual projections, the cheapest equivalent frontier model, and a 5-model leaderboard so you can see whether you're overpaying.
Formula
Per-request cost = (input_tokens / 1,000,000) × input_price + (output_tokens / 1,000,000) × output_price. Cached input swaps in the cached_input_price for the cached portion. Batch API multiplies the whole thing by (1 − batch_discount). Daily = per-request × requests_per_day. Monthly = daily × 30. Annual = daily × 365.
Assumptions
- •Pricing is sourced from each provider's official pricing page on the date noted in the data refresh tag. Pricing changes monthly; verify against the source URL before committing to large workloads.
- •A 30-day month and 365-day year are used — actual monthly cost will vary by ±3% depending on the calendar month.
- •Batch API discount is applied uniformly when ticked. Some providers charge separately for image / function-calling tokens; this calc covers text-only pricing.
- •Cached input pricing applies only to the cached portion of input tokens. Fresh tokens within the same request still use the standard input rate.
- •Cost-comparison "cheapest frontier" is defined as any model with MMLU ≥ 80. That's a coarse proxy — for code or vision workloads, swap based on HumanEval / MMMU instead (see the Comparison tab).
FAQ
How do I count tokens before I run the request?
Use the official tokeniser from each provider: tiktoken for OpenAI, anthropic.count_tokens for Anthropic, the Vertex AI count_tokens endpoint for Gemini. Rough estimate: 1 token ≈ 4 English characters or ¾ of a word. Code is more token-dense than prose.
Why is output so much more expensive than input?
Output tokens require sequential decoding — the model generates them one at a time, and each one occupies GPU compute. Input tokens are processed in parallel during the prefill phase, which is much cheaper. Most frontier models price output 3–5× higher than input.
Should I use the Batch API?
Yes if your workload tolerates ~24h latency. The 50% discount is real and is the biggest single lever to reduce LLM costs. Common batch use cases: nightly content moderation, document summarisation pipelines, embeddings backfills.
How does prompt caching affect cost?
Anthropic prompt-cache and OpenAI cached-input charge 10–25% of the standard input rate for tokens that hit the cache. If you have a large stable system prompt (e.g. > 1,000 tokens) reused across requests, caching can drop your input bill by 50–80%.