Definitive Reference — March 2026

AI Model Benchmarks
2026

Definitive reference for AI model selection. 119 models × 55+ categories × 54 benchmarks. Sources: LM Council, Artificial Analysis, Scale AI SEAL, BFCL V4, BenchLM.ai, MTEB, Zyte, Speko.

119 models
55+ categories
54 benchmarks
4 pricing tiers
✓ Verified 2026-04-01 (v3) · Next update: 2026-04-08
MEGA-TABLE

All Models × All Parameters

Click header to sort. Filter by tier or search by name.

|
Model $/M in→out Ctx Max Out Caps SWE-V% SWE-Pro% GPQA% HLE% ARC-AGI-2% Tau2% BenchLM Best for
47 categories

King by Category

Quality #1/#2 — best by benchmark. Budget #1/#2 — best under $1/M input. Free — $0.

Embedding Models Comparison

26 models. MTEB scores, pricing, OpenRouter IDs. For RAG use Retrieval NDCG@10, not average MTEB.

Model OR model ID MTEB Dims Context $/M tokens Best for
Decision Tree
1. Already indexed? → don't change
2. $0 + GPU? → Qwen3-Emb-8B
3. $0 without GPU? → Gemini emb-001 or BGE-M3
4. General RAG? → text-emb-3-small ($0.02)
5. Accuracy critical? → voyage-3-large ($0.06)
6. Long docs? → Qwen3-Emb-8B (32K)
7. Code search? → Codestral Embed
8. Multilingual? → NVIDIA Nemotron
Key Insights
Gemini emb-001 = #1 English MTEB (68.3), 20K ctx
Qwen3-Emb-8B = #1 multilingual (70.6), Apache 2.0, 32K
NVIDIA VL = FREE on OR, 131K ctx, multimodal
text-emb-3-small = sweet spot ($0.02, 62.3 MTEB)
⚠️ Re-embedding = re-index entire corpus
⚠️ MTEB overall ≠ retrieval quality (NDCG@10)
Embedding Routing

Embedding Use-Case Routing

Which embedding model for which use case? Best → Budget → Free.

Use Case Best Model Budget Alt Free Alt
Quick Pick by Use Case
1. Multimodal (image+text)? → NVIDIA Nemotron-VL (FREE)
2. Privacy / self-host? → Qwen3-Emb-8B (Apache 2.0, 32K)
3. Docs > 32K? → NVIDIA VL (131K) or chunk + Qwen3 (32K)
4. Accuracy-critical RAG? → voyage-3-large ($0.06)
5. Multilingual? → Qwen3-Emb-8B ($0.01)
6. Code? → Codestral Embed ($0.15) or nv-embedcode (free NIM)
7. Everything else → text-emb-3-small ($0.02) — default pick
FREE Routing

FREE Task Routing

27 free models on OpenRouter + 5 CLI tools. Which one to use for which task?

Task Best FREE (OR) Backup FREE (OR) FREE CLI
Decision: FREE FIRST → paid only when free doesn't cut it
Coding: M2.5 / Qwen3 Coder → paid only for Opus-level reasoning
Classification: Qwen CLI (2000 RPD) → OR free for parallel batches
Research: Gemini CLI → Exa structured → paid OR rarely needed
Vision: NVIDIA Nano 12B VL → Gemini CLI → paid for quality
Pipeline routing

Task Routing

Which model to use for each pipeline phase.

Cache pricing

Cache Pricing (OpenRouter, March 2026)

Cache = real savings. Batch tasks with repeated system prompts reduce input cost by 50–90%.

Discount by provider
Key models: input vs cached
Batch classification: 1 system prompt × 1000+ items → up to 90% input savings. Always consider for batch model selection.
Quick Decision Matrix

I need a model for X → use Y

Fastest lookup. Verified 2026-04-01.

Practical insights (2026-03-29)
Live Benchmarks

What to watch in 2026

Only unsaturated benchmarks with real model differentiation.

✓ Active (not saturated)
⚠ Saturated (all frontier models 90%+ — low differentiation)
✕ Dead (saturated + contaminated — ignore)