Same MICE pipeline but on per-benchmark rankit (normal-score) normalisation — outlier-resistant, so one catastrophic benchmark can't tank a model the way robust-z does.
Showing 17 models that ran ≥3 of 7 benchmarks (9 excluded for thin coverage). Price = median effective $/1M tokens; Speed = throughput + latency. The n/7 chip = how many benchmarks back the score. English = English-language intelligence, a background prior anchoring every Score (not a German benchmark, never in the German tables).