Parametric Workbench v3.0
Cognitive Model Workspace
An interactive, high-fidelity workbench for analyzing, comparing, and cost-modeling frontier AI architectures.
Workspace Index 9 Models
Active catalog
Preference Leader GPT-5.5 Pro
1482 Elo
Efficiency Leader Llama 4
$0.15/1M
| Rank | Model | Provider | Chatbot Arena Elo | MMLU | GPQA | HumanEval | Input/Output per 1M | |
|---|---|---|---|---|---|---|---|---|
| #1 | GPT-5.5 Pro Proprietary | OpenAI | 1482 | 98.1% | 89.2% | 98.9% | $30.00/$180.00 | |
| #2 | Claude Opus 4.8 Proprietary | Anthropic | 1475 | 97.8% | 88.5% | 98.4% | $6.00/$30.00 | |
| #3 | GPT-5.5 Proprietary | OpenAI | 1438 | 96.8% | 84.1% | 97.2% | $5.00/$30.00 | |
| #4 | Claude Sonnet 4.6 Proprietary | Anthropic | 1422 | 95.2% | 81.5% | 96.8% | $3.00/$15.00 | |
| #5 | DeepSeek V4 Pro Proprietary | DeepSeek | 1405 | 94.8% | 80.2% | 96% | $0.43/$0.87 | |
| #6 | Grok 4.3 Proprietary | xAI | 1395 | 94.5% | 78.4% | 94.8% | $1.25/$2.50 | |
| #7 | Llama 4 Maverick Open weights | 1368 | 92.5% | 72.8% | 93.5% | $0.15/$0.60 | ||
| #8 | GLM 5.1 Open weights | Zhipu | 1352 | 91.2% | 70.5% | 93.2% | $0.98/$3.08 | |
| #9 | Gemini 3.5 Flash Proprietary | 1335 | 90.5% | 64.2% | 88.5% | $1.50/$9.00 |
Baseline Model A
VS
Comparison Model B
Parametric Spec Comparison
Workload Parameters
Configure transaction volumes and average token sizes to evaluate monthly run-rates.
Load Presets:
Estimated Run-Rate
Monthly Volume Ledger
Request Volume 1,000,000 /mo
Prompt Token Mass 2,000 B tokens
Completion Token Mass 800 B tokens
Model A
Prompt$0.00
Compl.$0.00
Model B
Prompt$0.00
Compl.$0.00