Pricing Deep-Dive

The 10 Cheapest LLM APIs in 2026 — Real Prices From 97 Models

We pulled the live per-token price of every model listed on our gateway — all 97 of them, June 2026 snapshot. The spread is wild: the cheapest usable model costs $0.075 per million blended tokens, about 150× less than GPT-5.5. Here is the full ranked list, the best budget pick for each job, and the honest part — when cheap quietly gets expensive.

TL;DR — the cheapest picks by job

How we built this list (real data, reproducible)

Nothing here is copied from vendor marketing pages. On June 12, 2026 we exported the live price list from our own gateway — the same table you can see on the public pricing page — covering 97 models across OpenAI, Anthropic, Google, Qwen, DeepSeek, MoonshotAI, xAI and more. We then ranked every model by blended price = 75% input rate + 25% output rate, the 3:1 input-to-output ratio we see in typical production traffic. Change the ratio and the order shifts slightly, but the tiers below are stable.

Disclosure: DataLLM Lab resells all of these models at list prices and earns the same margin whichever you choose — we have no stake in steering you cheap or expensive. Every number below is on the pricing page right now; if it moved since June, trust the live page.

The 10 cheapest LLM APIs, ranked

#ModelInput $/MOutput $/MBlended $/MContext
1gpt-oss-120b$0.04$0.18$0.075131K
2Qwen3 30B A3B Instruct$0.05$0.19$0.085131K
3Qwen3 235B Thinking$0.10$0.10$0.10262K
4Qwen3 32B$0.08$0.28$0.13131K
5Qwen3 14B$0.10$0.24$0.135132K
6GPT-5 Nano$0.05$0.40$0.138400K
7Gemini 2.5 Flash Lite$0.10$0.40$0.1751.0M
8GPT-4.1 Nano$0.10$0.40$0.1751.0M
9DeepSeek V3.2$0.23$0.34$0.258131K
10GPT-4o-mini$0.15$0.60$0.262128K

For perspective, the flagships on the same list: Gemini 3.1 Pro blends to $4.50, Claude Sonnet 4.6 to $6.00, Claude Opus 4.7 to $10.00 and GPT-5.5 to $11.25. Everything in the top-10 table is 17–150× cheaper than that band. Cross-reference quality before you commit — LMArena's community leaderboard and Artificial Analysis both track these models — and note that several top-10 entries are previous-generation models: that's largely why they're cheap, not a defect.

What $1 actually buys you

Price-per-million numbers are abstract; here's the same data flipped into blended tokens per dollar:

gpt-oss-120b Qwen3 30B A3B Qwen3 235B Think Gemini 2.5 F-Lite DeepSeek V3.2 Gemini 3.1 Pro Claude Opus 4.7 GPT-5.5 13.3M 11.8M 10.0M 5.7M 3.9M 0.22M 0.10M 0.09M Blended tokens per $1 (75% input + 25% output, June 2026 list prices)
Tokens per dollar, budget tier vs flagship tier. A dollar buys 13.3M blended tokens on gpt-oss-120b and 0.09M on GPT-5.5 — a 150× spread. Chart: DataLLM Lab, June 12, 2026; computed from our live price list.

A worked example: the support-bot bill

Take a customer-support assistant doing 1M requests/month at ~2K input + 0.5K output tokens each (2B input + 500M output tokens monthly):

Same traffic, $24,830/month difference. Even if the budget model needs a flagship fallback for the hardest 10% of tickets, you're still cutting the bill by roughly 85-90%. That fallback pattern is exactly what routing is for — more below.

The right cheap model for each job

Bulk & classification

  • gpt-oss-120b — cheapest ticket in town
  • Qwen3 30B A3B — strong multilingual
  • Tagging, routing, dedupe, sentiment — quality ceiling barely matters

Long documents

Budget reasoning

  • Qwen3 235B Thinking — flat $0.10/$0.10
  • Unusual pricing: long chains of thought don't multiply your bill
  • Good for math, planning, eval pipelines

Current-gen small

  • GPT-5.4 Nano — $0.20/$1.25, 400K window
  • GPT-5.4 Mini — $0.75/$4.50 when you need more
  • Newest behavior and tool-use at single-digit % of flagship cost

When cheap quietly gets expensive

The honest section. Three ways a $0.08 model costs you more than a $5 one:

Rule of thumb we see across gateway traffic: route the easy 80% to a top-10 model from this list, keep a flagship behind it for the hard 20%, and you capture most of the savings with none of the quality cliff. Official rate cards if you want to verify upstream: OpenAI pricing, Anthropic pricing, Google's Gemini API rates.

One endpoint, cheap by default

Every model in this article is callable through the same OpenAI-compatible endpoint on DataLLM Lab — switching from a $11 flagship to a $0.08 workhorse is a one-string change:

# Same request — only the model string changes
curl https://api.datallmlab.com/v1/chat/completions \
  -H "Authorization: Bearer $DATALLMLAB_API_KEY" \
  -d '{
    "model": "openai/gpt-oss-120b",  # was: "openai/gpt-5.5" — bill drops ~150×
    "messages": [{"role": "user", "content": "Classify this ticket..."}]
  }'

Model IDs for all 97 priced models are in the model directory; live rates on the pricing page.

Stop overpaying for easy tokens

Get one API key, send your bulk traffic to a top-10 cheapest model, and keep a flagship on standby — DataLLM Lab compares prices and routes automatically.

FAQ

What is the cheapest LLM API in 2026?

On our June 2026 list: gpt-oss-120b at $0.04/M input and $0.18/M output — about $0.075 blended, roughly 150× cheaper than GPT-5.5. Qwen3 30B A3B ($0.085) and Qwen3 235B Thinking ($0.10) are right behind.

How is the "blended" price calculated?

Blended = 75% × input rate + 25% × output rate, matching the ~3:1 input-to-output token ratio of typical production traffic. Output-heavy workloads should re-weight toward the output column.

Are cheap models good enough for production?

For bounded tasks — classification, tagging, extraction, short replies — usually yes, with no difference your users would notice. For long agentic coding or hard reasoning, failure-and-retry costs can erase the savings; evaluate on your own prompts first (you can do it in Chat without writing code).

What's the cheapest model with a 1M context window?

Gemini 2.5 Flash Lite and GPT-4.1 Nano, tied at ~$0.175 blended with 1.0M-token windows — about 64× cheaper than GPT-5.5 for long-context work.

Do these prices include prompt-caching discounts?

No — these are standard list rates. Cache reads can cut input costs much further on repeated prompts (e.g. GPT-5.5 cache reads are $0.50/M vs $5.00 standard). If your traffic reuses long system prompts, factor caching in; rates are on each model's page in the directory.

Written by
Kevin Fan

Founder of DataLLM Lab, the unified LLM gateway. Kevin spends his days watching how hundreds of production workloads route across 300+ models — and writes up what the traffic actually shows about cost, latency, and model choice.

One API for every model

Pay flagship prices only when it matters.

One API key for all 97 priced models — route bulk traffic to the cheap tier, keep GPT-5.5 and Claude Opus 4.7 on standby for the hard 20%, and watch the bill drop.