Compare AI & LLM API prices in one place
Side-by-side API pricing for every major model — GPT-5, Claude, Gemini, Llama, Mistral, DeepSeek and more. Sort by input or output cost per token, see the context window, and find the cheapest model that does the job.
LLM API pricing comparison
All prices in USD per 1,000,000 tokens. Click any column to sort — the table defaults to cheapest reference cost first. Click a model to see its full provider pricing.
| Model | Provider | Input /1M | Output /1M | Cost / call* | Context | Type | |
|---|---|---|---|---|---|---|---|
| Gemini 1.5 Flash-8B Cheapest capable model | $0.037 | $0.150 | $0.0002 | 1M | Fast / cheap | ||
| Command R7B Cheapest Cohere tier | Cohere | $0.037 | $0.150 | $0.0002 | 128K | Fast / cheap | |
| Qwen3 235B A22B Open weights, very cheap | Alibaba (Qwen) | $0.090 | $0.100 | $0.0002 | 256K | Fast / cheap open | |
| MiMo v2.5 Open, cheap, 1M context | Xiaomi (MiMo) | $0.105 | $0.280 | $0.0004 | 1M | Fast / cheap open | |
| Step 3.5 Flash Cheapest Step tier | StepFun | $0.090 | $0.300 | $0.0004 | 256K | Fast / cheap | |
| Grok 3 mini | xAI (Grok) | $0.100 | $0.300 | $0.0004 | 128K | Fast / cheap | |
| Llama 3.1 8B Open weights | Meta (Llama) | $0.200 | $0.200 | $0.0004 | 128K | Open weights open | |
| DeepSeek-V4 Flash Non-thinking mode, very cheap | DeepSeek | $0.140 | $0.280 | $0.0004 | 128K | Fast / cheap open | |
| Llama 4 Scout Open weights (Groq) | Meta (Llama) | $0.110 | $0.340 | $0.0005 | 128K | Open weights open | |
| Gemini 2.0 Flash | $0.100 | $0.400 | $0.0005 | 1M | Fast / cheap | ||
| Grok 4 Fast 2M context | xAI (Grok) | $0.200 | $0.500 | $0.0007 | 2M | Fast / cheap | |
| Mistral Small 4 | Mistral | $0.150 | $0.600 | $0.0007 | 128K | Fast / cheap | |
| Command R | Cohere | $0.150 | $0.600 | $0.0007 | 128K | Balanced | |
| GLM-4.5 Air Cheap open tier | Zhipu AI (GLM) | $0.130 | $0.850 | $0.0010 | 128K | Fast / cheap open | |
| Llama 4 Maverick Open weights (Together AI) | Meta (Llama) | $0.270 | $0.850 | $0.0011 | 500K | Open weights open | |
| Codestral Code-specialised | Mistral | $0.300 | $0.900 | $0.0012 | 32K | Balanced | |
| MiniMax-M2 Open weights | MiniMax | $0.260 | $1.00 | $0.0013 | 200K | Balanced open | |
| DeepSeek-V4 Pro Thinking mode | DeepSeek | $0.435 | $0.870 | $0.0013 | 128K | Reasoning open | |
| MiMo v2.5 Pro Open weights, 1M context | Xiaomi (MiMo) | $0.435 | $0.870 | $0.0013 | 1M | Balanced open | |
| Step 3.7 Flash Fast tier | StepFun | $0.200 | $1.15 | $0.0014 | 256K | Fast / cheap | |
| GPT-5.4 nano Cheapest OpenAI tier | OpenAI | $0.200 | $1.25 | $0.0015 | 400K | Fast / cheap | |
| MiniMax-M3 Open weights, 1M context | MiniMax | $0.300 | $1.20 | $0.0015 | 1M | Flagship open | |
| Grok Code Fast 1 Code-specialised | xAI (Grok) | $0.200 | $1.50 | $0.0017 | 256K | Balanced | |
| Mistral Large 3 Flagship | Mistral | $0.500 | $1.50 | $0.0020 | 256K | Flagship | |
| Qwen3 Coder Open, code-specialised, 1M ctx | Alibaba (Qwen) | $0.220 | $1.80 | $0.0020 | 1M | Balanced open | |
| Llama 3.3 70B Open weights | Meta (Llama) | $1.04 | $1.04 | $0.0021 | 128K | Open weights open | |
| GLM-4.6 Popular open model | Zhipu AI (GLM) | $0.430 | $1.74 | $0.0022 | 200K | Balanced open | |
| Gemini 2.5 Flash | $0.300 | $2.50 | $0.0028 | 1M | Balanced | ||
| Kimi K2 Open weights | Moonshot AI (Kimi) | $0.570 | $2.30 | $0.0029 | 128K | Balanced open | |
| Kimi K2 Thinking Open-weight reasoning | Moonshot AI (Kimi) | $0.600 | $2.50 | $0.0031 | 256K | Reasoning open | |
| GLM-5.2 Open weights, 1M context | Zhipu AI (GLM) | $0.950 | $3.00 | $0.0040 | 1M | Flagship open | |
| Qwen3 Max Flagship | Alibaba (Qwen) | $0.780 | $3.90 | $0.0047 | 256K | Flagship | |
| GPT-5.4 mini | OpenAI | $0.750 | $4.50 | $0.0053 | 400K | Balanced | |
| Claude Haiku 4.5 Fastest, near-frontier | Anthropic | $1.00 | $5.00 | $0.0060 | 200K | Fast / cheap | |
| Mistral Medium 3.5 | Mistral | $1.50 | $7.50 | $0.0090 | 256K | Balanced | |
| Gemini 2.5 Pro 1M context (≤200k tier) | $1.25 | $10.00 | $0.011 | 1M | Flagship | ||
| Command A Flagship, RAG-tuned | Cohere | $2.50 | $10.00 | $0.013 | 256K | Flagship | |
| GPT-5.4 Balanced flagship | OpenAI | $2.50 | $15.00 | $0.017 | 400K | Flagship | |
| Claude Sonnet 4.6 Best speed/intelligence; caching cuts input ~90% | Anthropic | $3.00 | $15.00 | $0.018 | 1M | Balanced | |
| Grok 4 Flagship | xAI (Grok) | $3.00 | $15.00 | $0.018 | 256K | Flagship | |
| Claude Opus 4.8 Top Opus reasoning/agentic | Anthropic | $5.00 | $25.00 | $0.030 | 1M | Flagship | |
| GPT-5.5 Flagship | OpenAI | $5.00 | $30.00 | $0.035 | 400K | Flagship | |
| Claude Fable 5 Most capable widely released | Anthropic | $10.00 | $50.00 | $0.060 | 1M | Flagship |
*Reference cost of one call with 1,000 input + 1,000 output tokens — a neutral yardstick. Use the calculator for your real usage. Cheapest row highlighted. Tick any models to build a shareable price card.
Share the image anywhere, or send the link — it reopens this exact comparison.
Go deeper
▣ Cost calculator
Enter your tokens and request volume to see the real monthly cost across every model, ranked cheapest first.
▤ Local VRAM calculator
Running models locally? Find out how much GPU VRAM a model needs at each quantization — and which card fits.
◇ Cheapest LLM API
The lowest-cost capable models for high-volume work, with the trade-offs that matter at scale.
Pricing by provider
Best LLM for…
Frequently asked questions
How is LLM API pricing calculated?
Almost every LLM API charges per token, split into an input (prompt) price and a usually higher output (completion) price, quoted per 1,000,000 tokens. Your bill is (input tokens × input price) + (output tokens × output price). A token is roughly ¾ of a word.
Which LLM API is cheapest?
For high-volume work the cheapest capable models are Google Gemini 1.5 Flash-8B and 2.0 Flash, OpenAI GPT-5.4 nano, and DeepSeek-V4 Flash. The lowest sticker price is not always the cheapest in practice — a weaker model that needs retries or escalation can cost more overall.
What is the difference between input and output token pricing?
Input tokens are what you send (the prompt, system message, context, documents). Output tokens are what the model generates. Output is typically 2–5× more expensive than input, so capping max output length is the fastest way to cut cost.
Are these prices up to date?
Prices are list prices last verified on 2026-06-26 and link to each provider's official pricing page. The AI market moves fast — always confirm the current price with the provider before committing to volume.