German LLM Benchmark · Model Profile

Gemini 3.1 Flash-Lite

Google provider-internal run 2026-06-10

77 .6%

avg. German score

#6 of 30 models

+5.5pp above avg.

Benchmark breakdown

GermEval

83.2%

Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.

Named-entity recognition native · Native German

via GermEval (via EuroEval) ↗

INCLUDE

71.9%

Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.

4-option multiple choice native · Native German

via CohereLabs/include-base-44 ↗

MMLU-Pro

82.2%

Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.

10-option multiple choice translated · Professional translation

via li-lab/MMLU-ProX ↗

MMMLU

86.8%

OpenAI's multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.

4-option multiple choice translated · Professional translation

via openai/MMMLU ↗

MuSR

84.4%

Multi-step soft reasoning over long narrative contexts — murder mysteries, object placement and team allocation. Requires chaining clues across several paragraphs to reach the correct answer. Translated to German from the original English MuSR benchmark.

2–5 option multiple choice translated · Professional translation

via zayne-sprague/MuSR ↗

SB10K

60.4%

Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.

3-class sentiment native · Native German

via SB10K (via EuroEval) ↗

ScaLA

74.2%

Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.

Binary acceptability native · Native German

via ScaLA-de (via EuroEval) ↗

Cost & speed

$0.582 per 1,000 questions

223 tokens / second

0.61s time to first token

Compare with other models → ← View leaderboard