German LLM Benchmark · Model Profile

Tencent HY3-Preview

Tencent provider-internal run 2026-06-09

67 .4%

avg. German score

#24 of 30 models

−4.7pp below avg.

Benchmark breakdown

GermEval

77.3%

Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.

Named-entity recognition native · Native German

via GermEval (via EuroEval) ↗

INCLUDE

69.1%

Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.

4-option multiple choice native · Native German

via CohereLabs/include-base-44 ↗

MMLU-Pro

73.2%

Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.

10-option multiple choice translated · Professional translation

via li-lab/MMLU-ProX ↗

MMMLU

83.7%

OpenAI's multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.

4-option multiple choice translated · Professional translation

via openai/MMMLU ↗

SB10K

38.9%

Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.

3-class sentiment native · Native German

via SB10K (via EuroEval) ↗

ScaLA

62.5%

Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.

Binary acceptability native · Native German

via ScaLA-de (via EuroEval) ↗

Cost & speed

$0.139 per 1,000 questions

107 tokens / second

2.69s time to first token

Compare with other models → ← View leaderboard