Definitive Reference — March 2026

AI Model Benchmarks
2026

Definitive reference for AI model selection. 119 models × 55+ categories × 54 benchmarks. Sources: LM Council, Artificial Analysis, Scale AI SEAL, BFCL V4, BenchLM.ai, MTEB, Zyte, Speko.

119 models

55+ categories

54 benchmarks

4 pricing tiers

✓ Verified 2026-04-01 (v3) · Next update: 2026-04-08

MEGA-TABLE

All Models × All Parameters

Click header to sort. Filter by tier or search by name.

Model	$/M in→out	Ctx	Max Out	Caps	SWE-V%	SWE-Pro%	GPQA%	HLE%	ARC-AGI-2%	Tau2%	BenchLM	Best for

K. Embeddings

Embedding Models Comparison

26 models. MTEB scores, pricing, OpenRouter IDs. For RAG use Retrieval NDCG@10, not average MTEB.

Model	OR model ID	MTEB	Dims	Context	$/M tokens	Best for

Decision Tree

1. Already indexed? → don't change

2. $0 + GPU? → Qwen3-Emb-8B

3. $0 without GPU? → Gemini emb-001 or BGE-M3

4. General RAG? → text-emb-3-small ($0.02)

5. Accuracy critical? → voyage-3-large ($0.06)

6. Long docs? → Qwen3-Emb-8B (32K)

7. Code search? → Codestral Embed

8. Multilingual? → NVIDIA Nemotron

Key Insights

⭐ Gemini emb-001 = #1 English MTEB (68.3), 20K ctx

⭐ Qwen3-Emb-8B = #1 multilingual (70.6), Apache 2.0, 32K

⭐ NVIDIA VL = FREE on OR, 131K ctx, multimodal

⭐ text-emb-3-small = sweet spot ($0.02, 62.3 MTEB)

⚠️ Re-embedding = re-index entire corpus

⚠️ MTEB overall ≠ retrieval quality (NDCG@10)

Embedding Routing

Embedding Use-Case Routing

Which embedding model for which use case? Best → Budget → Free.

Use Case	Best Model	Budget Alt	Free Alt

Quick Pick by Use Case

1. Multimodal (image+text)? → NVIDIA Nemotron-VL (FREE)

2. Privacy / self-host? → Qwen3-Emb-8B (Apache 2.0, 32K)

3. Docs > 32K? → NVIDIA VL (131K) or chunk + Qwen3 (32K)

4. Accuracy-critical RAG? → voyage-3-large ($0.06)

5. Multilingual? → Qwen3-Emb-8B ($0.01)

6. Code? → Codestral Embed ($0.15) or nv-embedcode (free NIM)

7. Everything else → text-emb-3-small ($0.02) — default pick

FREE Routing

FREE Task Routing

27 free models on OpenRouter + 5 CLI tools. Which one to use for which task?

Task	Best FREE (OR)	Backup FREE (OR)	FREE CLI

Decision: FREE FIRST → paid only when free doesn't cut it

• Coding: M2.5 / Qwen3 Coder → paid only for Opus-level reasoning

• Classification: Qwen CLI (2000 RPD) → OR free for parallel batches

• Research: Gemini CLI → Exa structured → paid OR rarely needed

• Vision: NVIDIA Nano 12B VL → Gemini CLI → paid for quality

Cache pricing

Cache Pricing (OpenRouter, March 2026)

Cache = real savings. Batch tasks with repeated system prompts reduce input cost by 50–90%.

Discount by provider

Key models: input vs cached

Batch classification: 1 system prompt × 1000+ items → up to 90% input savings. Always consider for batch model selection.

Live Benchmarks

What to watch in 2026

Only unsaturated benchmarks with real model differentiation.

✓ Active (not saturated)

⚠ Saturated (all frontier models 90%+ — low differentiation)

✕ Dead (saturated + contaminated — ignore)

AI Model Benchmarks
2026

All Models × All Parameters

King by Category

Embedding Models Comparison

Embedding Use-Case Routing

FREE Task Routing

Task Routing

Cache Pricing (OpenRouter, March 2026)

I need a model for X → use Y

What to watch in 2026

AI Model Benchmarks2026

All Models × All Parameters

King by Category

Embedding Models Comparison

Embedding Use-Case Routing

FREE Task Routing

Task Routing

Cache Pricing (OpenRouter, March 2026)

I need a model for X → use Y

What to watch in 2026

AI Model Benchmarks
2026