German LLM Benchmark · Model Profile

DeepSeek V4 Pro

DeepSeek fp8 run 2026-06-09

71 .5%

avg. German score

#19 of 30 models

−0.6pp below avg.

Benchmark breakdown

GermEval

82.2%

Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.

Named-entity recognition native · Native German

via GermEval (via EuroEval) ↗

INCLUDE

70.5%

Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.

4-option multiple choice native · Native German

via CohereLabs/include-base-44 ↗

SB10K

60.0%

Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.

3-class sentiment native · Native German

via SB10K (via EuroEval) ↗

ScaLA

73.5%

Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.

Binary acceptability native · Native German

via ScaLA-de (via EuroEval) ↗

Cost & speed

$1.534 per 1,000 questions

Compare with other models → ← View leaderboard