a PeerBench project

Admin · MICE (robust-z) aggregation internal

Multiple imputation (chained equations, m=40) on per-benchmark robust-z; composite pooled with Rubin's rules. Score: 50=average model.

Percentile-mean Bradley-Terry ★ MICE (rank-norm) Dumb average

Weight

Price

Speed

Model

Score

Price

Speed

English

Gemini 3.1 Pro 5/7

Google · Closed

69.3

60–79

$6.75

$$$ · /Mtok

127

tok/s

Gemini 3.5 Flash 7/7

Google · Closed

65.4

63–68

$2.81

$$ · /Mtok

202.5

tok/s

Qwen3.7 Max 4/7

Alibaba · Closed

60.7

51–71

$1.28

$$ · /Mtok

170.3

tok/s

Gemini 3.1 Flash-Lite 7/7

Google · Closed

59.4

57–62

$0.44

$ · /Mtok

302.3

tok/s

Gemma 4 31B 7/7

Google · Open weights

57.9

56–60

$0.17

$ · /Mtok

38.4

tok/s

Gemini 2.5 Flash 7/7

Google · Closed

57.6

55–60

$1.00

$$ · /Mtok

216.2

tok/s

Qwen3.6 35B-A3B 6/7

Alibaba · Open weights

57.1

51–64

$0.30

$ · /Mtok

141.6

tok/s

DeepSeek V4 Pro 4/7

DeepSeek · Open weights

56.7

47–66

$1.62

$$ · /Mtok

57.8

tok/s

MiMo V2.5 Pro 5/7

Xiaomi · Open weights

55.6

47–65

$0.85

$$ · /Mtok

49.2

tok/s

Claude Haiku 4.5 3/7

Anthropic · Closed

53.5

38–69

$3.36

$$$ · /Mtok

140.6

tok/s

Gemma 4 12B 7/7

Google · Open weights

51.5

49–54

—

46.3

tok/s

DeepSeek V4 Flash 7/7

DeepSeek · Open weights

51.2

39–64

$0.14

$ · /Mtok

102.5

tok/s

Tencent HY3-Preview 6/7

Tencent · Open weights

49.1

34–64

$0.08

$ · /Mtok

94.8

tok/s

Qwen3.5 9B 7/7

Alibaba · Open weights

45.3

40–51

$0.12

$ · /Mtok

57.7

tok/s

Gemma 4 26B A4B 7/7

Google · Open weights

41.1

16–67

$0.23

$ · /Mtok

84.4

tok/s

Qwen3 14B 7/7

Alibaba · Open weights

36.2

26–47

$0.14

$ · /Mtok

64.7

tok/s

GLM-5.1 5/7

Z.ai · Open weights

17.0

0–52

$1.45

$$ · /Mtok

71.4

tok/s

Showing 17 models that ran ≥3 of 7 benchmarks (9 excluded for thin coverage). Price = median effective $/1M tokens; Speed = throughput + latency. The n/7 chip = how many benchmarks back the score. English = English-language intelligence, a background prior anchoring every Score (not a German benchmark, never in the German tables).