GermEval
79.9% Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.
Named-entity recognition native · Native German
via GermEval (via EuroEval) ↗
INCLUDE
68.3% Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.
4-option multiple choice native · Native German
via CohereLabs/include-base-44 ↗
MMLU-Pro
80.0% Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.
10-option multiple choice translated · Professional translation
via li-lab/MMLU-ProX ↗
MMMLU
85.8% OpenAI's multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.
4-option multiple choice translated · Professional translation
via openai/MMMLU ↗
SB10K
58.4% Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.
3-class sentiment native · Native German
via SB10K (via EuroEval) ↗
ScaLA
64.2% Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.
Binary acceptability native · Native German
via ScaLA-de (via EuroEval) ↗