The 10 Cheapest LLM APIs in 2026 — Real Prices From 97 Models
We pulled the live per-token price of every model listed on our gateway — all 97 of them, June 2026 snapshot. The spread is wild: the cheapest usable model costs $0.075 per million blended tokens, about 150× less than GPT-5.5. Here is the full ranked list, the best budget pick for each job, and the honest part — when cheap quietly gets expensive.
TL;DR — the cheapest picks by job
- Cheapest overall: gpt-oss-120b — $0.04 in / $0.18 out per million ($0.075 blended).
- Cheapest reasoning model: Qwen3 235B Thinking — a flat $0.10 in and out.
- Cheapest 1M-token context: Gemini 2.5 Flash Lite and GPT-4.1 Nano, tied at ~$0.175 blended.
- Cheapest current-gen small model: GPT-5.4 Nano — $0.20 / $1.25, with a 400K window.
- Flagships for contrast: Claude Opus 4.7 ($10.00 blended) and GPT-5.5 ($11.25 blended).
How we built this list (real data, reproducible)
Nothing here is copied from vendor marketing pages. On June 12, 2026 we exported the live price list from our own gateway — the same table you can see on the public pricing page — covering 97 models across OpenAI, Anthropic, Google, Qwen, DeepSeek, MoonshotAI, xAI and more. We then ranked every model by blended price = 75% input rate + 25% output rate, the 3:1 input-to-output ratio we see in typical production traffic. Change the ratio and the order shifts slightly, but the tiers below are stable.
The 10 cheapest LLM APIs, ranked
| # | Model | Input $/M | Output $/M | Blended $/M | Context |
|---|---|---|---|---|---|
| 1 | gpt-oss-120b | $0.04 | $0.18 | $0.075 | 131K |
| 2 | Qwen3 30B A3B Instruct | $0.05 | $0.19 | $0.085 | 131K |
| 3 | Qwen3 235B Thinking | $0.10 | $0.10 | $0.10 | 262K |
| 4 | Qwen3 32B | $0.08 | $0.28 | $0.13 | 131K |
| 5 | Qwen3 14B | $0.10 | $0.24 | $0.135 | 132K |
| 6 | GPT-5 Nano | $0.05 | $0.40 | $0.138 | 400K |
| 7 | Gemini 2.5 Flash Lite | $0.10 | $0.40 | $0.175 | 1.0M |
| 8 | GPT-4.1 Nano | $0.10 | $0.40 | $0.175 | 1.0M |
| 9 | DeepSeek V3.2 | $0.23 | $0.34 | $0.258 | 131K |
| 10 | GPT-4o-mini | $0.15 | $0.60 | $0.262 | 128K |
For perspective, the flagships on the same list: Gemini 3.1 Pro blends to $4.50, Claude Sonnet 4.6 to $6.00, Claude Opus 4.7 to $10.00 and GPT-5.5 to $11.25. Everything in the top-10 table is 17–150× cheaper than that band. Cross-reference quality before you commit — LMArena's community leaderboard and Artificial Analysis both track these models — and note that several top-10 entries are previous-generation models: that's largely why they're cheap, not a defect.
What $1 actually buys you
Price-per-million numbers are abstract; here's the same data flipped into blended tokens per dollar:
A worked example: the support-bot bill
Take a customer-support assistant doing 1M requests/month at ~2K input + 0.5K output tokens each (2B input + 500M output tokens monthly):
- On GPT-5.5: 2,000 × $5 + 500 × $30 = $25,000 / month
- On gpt-oss-120b: 2,000 × $0.04 + 500 × $0.18 = $170 / month
Same traffic, $24,830/month difference. Even if the budget model needs a flagship fallback for the hardest 10% of tickets, you're still cutting the bill by roughly 85-90%. That fallback pattern is exactly what routing is for — more below.
The right cheap model for each job
Bulk & classification
- gpt-oss-120b — cheapest ticket in town
- Qwen3 30B A3B — strong multilingual
- Tagging, routing, dedupe, sentiment — quality ceiling barely matters
Long documents
- Gemini 2.5 Flash Lite — 1M context at $0.175
- GPT-4.1 Nano — same price, same window
- ~64× cheaper than GPT-5.5 for long-context summarization
Budget reasoning
- Qwen3 235B Thinking — flat $0.10/$0.10
- Unusual pricing: long chains of thought don't multiply your bill
- Good for math, planning, eval pipelines
Current-gen small
- GPT-5.4 Nano — $0.20/$1.25, 400K window
- GPT-5.4 Mini — $0.75/$4.50 when you need more
- Newest behavior and tool-use at single-digit % of flagship cost
When cheap quietly gets expensive
The honest section. Three ways a $0.08 model costs you more than a $5 one:
- Retry tax. If a budget model fails your task 30% of the time and you retry with a flagship, you pay for both calls — and added latency. Measure the end-to-end cost per solved task, not per call.
- Long-generation drift. Small models degrade faster on multi-thousand-token outputs (reports, multi-file code). Our GPT-5.5 vs Claude Opus 4.7 comparison covers what the flagship tier buys you there.
- Output-heavy workloads flip the math. Blended price assumes 3:1 input:output. A chatty agent at 1:1 makes output rates dominate — recheck the ranking with your own ratio on the pricing page.
Rule of thumb we see across gateway traffic: route the easy 80% to a top-10 model from this list, keep a flagship behind it for the hard 20%, and you capture most of the savings with none of the quality cliff. Official rate cards if you want to verify upstream: OpenAI pricing, Anthropic pricing, Google's Gemini API rates.
One endpoint, cheap by default
Every model in this article is callable through the same OpenAI-compatible endpoint on DataLLM Lab — switching from a $11 flagship to a $0.08 workhorse is a one-string change:
# Same request — only the model string changes
curl https://api.datallmlab.com/v1/chat/completions \
-H "Authorization: Bearer $DATALLMLAB_API_KEY" \
-d '{
"model": "openai/gpt-oss-120b", # was: "openai/gpt-5.5" — bill drops ~150×
"messages": [{"role": "user", "content": "Classify this ticket..."}]
}'
Model IDs for all 97 priced models are in the model directory; live rates on the pricing page.
Stop overpaying for easy tokens
Get one API key, send your bulk traffic to a top-10 cheapest model, and keep a flagship on standby — DataLLM Lab compares prices and routes automatically.
FAQ
What is the cheapest LLM API in 2026?
On our June 2026 list: gpt-oss-120b at $0.04/M input and $0.18/M output — about $0.075 blended, roughly 150× cheaper than GPT-5.5. Qwen3 30B A3B ($0.085) and Qwen3 235B Thinking ($0.10) are right behind.
How is the "blended" price calculated?
Blended = 75% × input rate + 25% × output rate, matching the ~3:1 input-to-output token ratio of typical production traffic. Output-heavy workloads should re-weight toward the output column.
Are cheap models good enough for production?
For bounded tasks — classification, tagging, extraction, short replies — usually yes, with no difference your users would notice. For long agentic coding or hard reasoning, failure-and-retry costs can erase the savings; evaluate on your own prompts first (you can do it in Chat without writing code).
What's the cheapest model with a 1M context window?
Gemini 2.5 Flash Lite and GPT-4.1 Nano, tied at ~$0.175 blended with 1.0M-token windows — about 64× cheaper than GPT-5.5 for long-context work.
Do these prices include prompt-caching discounts?
No — these are standard list rates. Cache reads can cut input costs much further on repeated prompts (e.g. GPT-5.5 cache reads are $0.50/M vs $5.00 standard). If your traffic reuses long system prompts, factor caching in; rates are on each model's page in the directory.
DataLLM Lab