<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>German LLM Leaderboard</title><description>Benchmark results for the newest frontier models on German-language tasks — INCLUDE-DE, MMLU-Pro, MMMLU, SB10K, ScaLA, GermEval.</description><link>https://dach.peerbench.ai/</link><language>de</language><item><title>GPT-5.5 (provider-internal) — 77.9% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/openai/gpt-5.5/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/openai/gpt-5.5/</guid><description>&lt;p&gt;&lt;strong&gt;GPT-5.5&lt;/strong&gt; (OpenAI) ranked #4 of 26 models on German-language benchmarks with an average score of 77.9%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;85.9%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;74.8%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;86.3%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;62.5%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;79.7%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $7.811 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; provider-internal&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-13&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/openai/gpt-5.5/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Sat, 13 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Opus 4.8 (provider-internal) — 78.3% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/anthropic/claude-opus-4.8/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/anthropic/claude-opus-4.8/</guid><description>&lt;p&gt;&lt;strong&gt;Opus 4.8&lt;/strong&gt; (Anthropic) ranked #3 of 26 models on German-language benchmarks with an average score of 78.3%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;85.8%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;71.9%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;86.3%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;64.5%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;82.6%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $12.827 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; provider-internal&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-13&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/anthropic/claude-opus-4.8/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Sat, 13 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Gemma 4 31B (bf16) — 76.5% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/gemma-4-31b/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/gemma-4-31b/</guid><description>&lt;p&gt;&lt;strong&gt;Gemma 4 31B&lt;/strong&gt; (Google) ranked #7 of 26 models on German-language benchmarks with an average score of 76.5%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;82.6%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;68.3%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 10-option multiple choice&lt;/td&gt;&lt;td&gt;82.1%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 4-option multiple choice&lt;/td&gt;&lt;td&gt;86.6%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;83.5%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;61.0%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;71.0%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;: Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;: OpenAI&apos;s multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $0.220 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 40 tok/s&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 0.36s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; bf16&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-11&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/gemma-4-31b/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Gemma 4 12B (bf16) — 72.7% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/gemma-4-12b/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/gemma-4-12b/</guid><description>&lt;p&gt;&lt;strong&gt;Gemma 4 12B&lt;/strong&gt; (Google) ranked #12 of 26 models on German-language benchmarks with an average score of 72.7%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;79.6%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;69.1%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 10-option multiple choice&lt;/td&gt;&lt;td&gt;73.1%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 4-option multiple choice&lt;/td&gt;&lt;td&gt;79.0%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;81.6%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;62.2%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;64.1%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;: Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;: OpenAI&apos;s multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 46 tok/s&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 0.83s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; bf16&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-11&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/gemma-4-12b/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Qwen3.5 9B (bf16) — 69.7% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/qwen3.5-9b/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/qwen3.5-9b/</guid><description>&lt;p&gt;&lt;strong&gt;Qwen3.5 9B&lt;/strong&gt; (Alibaba) ranked #18 of 26 models on German-language benchmarks with an average score of 69.7%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;72.6%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;64.7%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 10-option multiple choice&lt;/td&gt;&lt;td&gt;73.4%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 4-option multiple choice&lt;/td&gt;&lt;td&gt;78.8%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;80.9%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;55.7%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;62.1%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;: Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;: OpenAI&apos;s multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $0.176 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 73 tok/s&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 0.21s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; bf16&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-11&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/qwen3.5-9b/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Qwen3 14B (bf16) — 66.2% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/qwen3-14b/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/qwen3-14b/</guid><description>&lt;p&gt;&lt;strong&gt;Qwen3 14B&lt;/strong&gt; (Alibaba) ranked #21 of 26 models on German-language benchmarks with an average score of 66.2%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;73.0%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;63.3%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 10-option multiple choice&lt;/td&gt;&lt;td&gt;64.1%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 4-option multiple choice&lt;/td&gt;&lt;td&gt;73.4%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;69.7%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;56.1%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;64.0%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;: Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;: OpenAI&apos;s multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $0.174 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 44 tok/s&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 0.42s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; bf16&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-11&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/qwen3-14b/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Gemini 3.5 Flash (provider-internal) — 80.6% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/google/gemini-3.5-flash/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/google/gemini-3.5-flash/</guid><description>&lt;p&gt;&lt;strong&gt;Gemini 3.5 Flash&lt;/strong&gt; (Google) ranked #2 of 26 models on German-language benchmarks with an average score of 80.6%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;86.0%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;74.1%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 10-option multiple choice&lt;/td&gt;&lt;td&gt;86.5%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 4-option multiple choice&lt;/td&gt;&lt;td&gt;89.3%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;84.2%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;64.3%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;79.4%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;: Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;: OpenAI&apos;s multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $3.723 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 181 tok/s&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 0.75s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; provider-internal&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-10&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/google/gemini-3.5-flash/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Gemini 3.1 Flash-Lite (provider-internal) — 77.6% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/google/gemini-3.1-flash-lite/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/google/gemini-3.1-flash-lite/</guid><description>&lt;p&gt;&lt;strong&gt;Gemini 3.1 Flash-Lite&lt;/strong&gt; (Google) ranked #5 of 26 models on German-language benchmarks with an average score of 77.6%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;83.2%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;71.9%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 10-option multiple choice&lt;/td&gt;&lt;td&gt;82.2%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 4-option multiple choice&lt;/td&gt;&lt;td&gt;86.8%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;84.4%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;60.4%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;74.2%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;: Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;: OpenAI&apos;s multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $0.584 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 223 tok/s&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 0.61s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; provider-internal&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-10&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/google/gemini-3.1-flash-lite/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>DeepSeek V4 Flash (bf16) — 70.5% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/deepseek/deepseek-v4-flash/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/deepseek/deepseek-v4-flash/</guid><description>&lt;p&gt;&lt;strong&gt;DeepSeek V4 Flash&lt;/strong&gt; (DeepSeek) ranked #16 of 26 models on German-language benchmarks with an average score of 70.5%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;80.2%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;70.5%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 10-option multiple choice&lt;/td&gt;&lt;td&gt;36.8%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 4-option multiple choice&lt;/td&gt;&lt;td&gt;84.9%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;83.5%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;62.4%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;74.9%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;: Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;: OpenAI&apos;s multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $0.123 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 115 tok/s&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 0.92s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; bf16&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-10&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/deepseek/deepseek-v4-flash/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Gemma 4 26B A4B (provider-internal) — 67.7% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/google/gemma-4-26b-a4b-it/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/google/gemma-4-26b-a4b-it/</guid><description>&lt;p&gt;&lt;strong&gt;Gemma 4 26B A4B&lt;/strong&gt; (Google) ranked #19 of 26 models on German-language benchmarks with an average score of 67.7%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;81.8%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;64.7%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 10-option multiple choice&lt;/td&gt;&lt;td&gt;78.2%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 4-option multiple choice&lt;/td&gt;&lt;td&gt;83.7%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;83.7%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;15.2%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;66.3%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;: Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;: OpenAI&apos;s multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $0.217 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 46 tok/s&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 1.16s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; provider-internal&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-10&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/google/gemma-4-26b-a4b-it/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Gemini 3.1 Pro (provider-internal) — 80.7% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/google/gemini-3.1-pro-preview/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/google/gemini-3.1-pro-preview/</guid><description>&lt;p&gt;&lt;strong&gt;Gemini 3.1 Pro&lt;/strong&gt; (Google) ranked #1 of 26 models on German-language benchmarks with an average score of 80.7%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;87.3%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;77.7%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;88.1%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;70.1%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;80.4%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $7.383 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; provider-internal&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-10&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/google/gemini-3.1-pro-preview/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>MiMo V2.5 Pro (fp8) — 73.4% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/xiaomi/mimo-v2.5-pro/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/xiaomi/mimo-v2.5-pro/</guid><description>&lt;p&gt;&lt;strong&gt;MiMo V2.5 Pro&lt;/strong&gt; (Xiaomi) ranked #10 of 26 models on German-language benchmarks with an average score of 73.4%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;81.9%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;67.6%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;80.9%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;61.7%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;74.8%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $1.190 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; fp8&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-10&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/xiaomi/mimo-v2.5-pro/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>GLM-5.1 (fp8) — 58.8% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/z-ai/glm-5.1/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/z-ai/glm-5.1/</guid><description>&lt;p&gt;&lt;strong&gt;GLM-5.1&lt;/strong&gt; (Z.ai) ranked #24 of 26 models on German-language benchmarks with an average score of 58.8%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;52.2%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;67.6%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;81.4%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;23.1%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;69.9%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $1.908 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 32 tok/s&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 0.65s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; fp8&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-10&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/z-ai/glm-5.1/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Qwen3.6 35B-A3B (fp8) — 72.8% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/qwen/qwen3.6-35b-a3b/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/qwen/qwen3.6-35b-a3b/</guid><description>&lt;p&gt;&lt;strong&gt;Qwen3.6 35B-A3B&lt;/strong&gt; (Alibaba) ranked #11 of 26 models on German-language benchmarks with an average score of 72.8%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;79.9%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;68.3%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 10-option multiple choice&lt;/td&gt;&lt;td&gt;80.0%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 4-option multiple choice&lt;/td&gt;&lt;td&gt;85.8%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;58.4%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;64.2%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;: Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;: OpenAI&apos;s multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $0.743 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 159 tok/s&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 0.62s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; fp8&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-09&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/qwen/qwen3.6-35b-a3b/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Gemini 2.5 Flash (provider-internal) — 76.9% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/google/gemini-2.5-flash/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/google/gemini-2.5-flash/</guid><description>&lt;p&gt;&lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt; (Google) ranked #6 of 26 models on German-language benchmarks with an average score of 76.9%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;81.8%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;70.5%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 10-option multiple choice&lt;/td&gt;&lt;td&gt;79.6%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 4-option multiple choice&lt;/td&gt;&lt;td&gt;84.7%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MuSR (DE)&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;83.3%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;61.6%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;77.1%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;: Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;: OpenAI&apos;s multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $1.380 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 159 tok/s&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 0.41s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; provider-internal&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-09&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/google/gemini-2.5-flash/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Tencent HY3-Preview (provider-internal) — 67.4% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/tencent/hy3-preview/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/tencent/hy3-preview/</guid><description>&lt;p&gt;&lt;strong&gt;Tencent HY3-Preview&lt;/strong&gt; (Tencent) ranked #20 of 26 models on German-language benchmarks with an average score of 67.4%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;77.3%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;69.1%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 10-option multiple choice&lt;/td&gt;&lt;td&gt;73.2%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 4-option multiple choice&lt;/td&gt;&lt;td&gt;83.7%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;38.9%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;62.5%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;: Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;: OpenAI&apos;s multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $0.139 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 107 tok/s&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 2.69s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; provider-internal&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-09&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/tencent/hy3-preview/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Qwen3.7 Max (provider-internal) — 71.6% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/qwen/qwen3.7-max/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/qwen/qwen3.7-max/</guid><description>&lt;p&gt;&lt;strong&gt;Qwen3.7 Max&lt;/strong&gt; (Alibaba) ranked #14 of 26 models on German-language benchmarks with an average score of 71.6%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;82.1%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;71.2%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;63.2%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;69.8%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $1.238 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; provider-internal&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-09&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/qwen/qwen3.7-max/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate></item><item><title>DeepSeek V4 Pro (fp8) — 71.5% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/deepseek/deepseek-v4-pro/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/deepseek/deepseek-v4-pro/</guid><description>&lt;p&gt;&lt;strong&gt;DeepSeek V4 Pro&lt;/strong&gt; (DeepSeek) ranked #15 of 26 models on German-language benchmarks with an average score of 71.5%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Named-entity recognition&lt;/td&gt;&lt;td&gt;82.2%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;70.5%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 3-class sentiment&lt;/td&gt;&lt;td&gt;60.0%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · Binary acceptability&lt;/td&gt;&lt;td&gt;73.5%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GermEval — German NER&lt;/strong&gt;: Native German named-entity recognition — identify persons, locations, organisations and misc entities in German text, emitted as JSON. Scored with seqeval micro-F1 excluding the noisy MISC class. Run reasoning-off.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SB10K — German sentiment&lt;/strong&gt;: Native German social-media sentiment classification — positive, neutral or negative. Human-annotated German text, not translated. Run reasoning-off; scored as exact-match accuracy on the predicted label.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ScaLA — German acceptability&lt;/strong&gt;: Native German linguistic acceptability — does the sentence read as grammatical German (ja / nein)? Built from clean vs. minimally-corrupted German sentences. Run reasoning-off.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $1.534 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; fp8&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-09&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/deepseek/deepseek-v4-pro/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Kimi K2.6 (fp8) — 69.8% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/moonshotai/kimi-k2.6/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/moonshotai/kimi-k2.6/</guid><description>&lt;p&gt;&lt;strong&gt;Kimi K2.6&lt;/strong&gt; (Moonshot) ranked #17 of 26 models on German-language benchmarks with an average score of 69.8%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;69.8%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $4.162 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; fp8&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-08&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/moonshotai/kimi-k2.6/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Claude Haiku 4.5 (provider-internal) — 75.6% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/anthropic/claude-haiku-4.5/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/anthropic/claude-haiku-4.5/</guid><description>&lt;p&gt;&lt;strong&gt;Claude Haiku 4.5&lt;/strong&gt; (Anthropic) ranked #8 of 26 models on German-language benchmarks with an average score of 75.6%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;68.3%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 10-option multiple choice&lt;/td&gt;&lt;td&gt;75.3%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 4-option multiple choice&lt;/td&gt;&lt;td&gt;83.1%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;: Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;: OpenAI&apos;s multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $3.248 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 131 tok/s&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 0.79s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; provider-internal&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-03&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/anthropic/claude-haiku-4.5/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Gemini 2.5 Flash-Lite (unverified) — 75.4% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/google/gemini-2.5-flash-lite/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/google/gemini-2.5-flash-lite/</guid><description>&lt;p&gt;&lt;strong&gt;Gemini 2.5 Flash-Lite&lt;/strong&gt; (Google) ranked #9 of 26 models on German-language benchmarks with an average score of 75.4%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 10-option multiple choice&lt;/td&gt;&lt;td&gt;71.2%&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Professional translation · 4-option multiple choice&lt;/td&gt;&lt;td&gt;79.6%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;MMLU-Pro — German&lt;/strong&gt;: Hard academic questions across 14 subjects — STEM, law, health, economics, philosophy and more. Professionally translated to German, with up to ten answer options per question.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MMMLU — German&lt;/strong&gt;: OpenAI&apos;s multilingual MMLU, German split — general knowledge spanning STEM, the humanities, social sciences and other domains. Professionally translated to German.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $0.540 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; unverified&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-03&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/google/gemini-2.5-flash-lite/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate></item><item><title>grok-4.3 (unverified) — 63.3% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/x-ai/grok-4.3/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/x-ai/grok-4.3/</guid><description>&lt;p&gt;&lt;strong&gt;grok-4.3&lt;/strong&gt; (Unknown) ranked #23 of 26 models on German-language benchmarks with an average score of 63.3%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;63.3%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $0.274 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; unverified&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-03&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/x-ai/grok-4.3/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Ministral 14B (unverified) — 58.3% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/mistralai/ministral-14b-2512/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/mistralai/ministral-14b-2512/</guid><description>&lt;p&gt;&lt;strong&gt;Ministral 14B&lt;/strong&gt; (Mistral) ranked #25 of 26 models on German-language benchmarks with an average score of 58.3%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;58.3%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $0.079 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; unverified&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-03&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/mistralai/ministral-14b-2512/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate></item><item><title>gemma-3-12b-it (unverified) — 53.2% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/google/gemma-3-12b-it/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/google/gemma-3-12b-it/</guid><description>&lt;p&gt;&lt;strong&gt;gemma-3-12b-it&lt;/strong&gt; (Unknown) ranked #26 of 26 models on German-language benchmarks with an average score of 53.2%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;53.2%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $0.053 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; unverified&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-06-03&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/google/gemma-3-12b-it/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate></item><item><title>gpt-oss-120b (unverified) — 66.2% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/openai/gpt-oss-120b/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/openai/gpt-oss-120b/</guid><description>&lt;p&gt;&lt;strong&gt;gpt-oss-120b&lt;/strong&gt; (OpenAI) ranked #22 of 26 models on German-language benchmarks with an average score of 66.2%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;66.2%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $0.056 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Speed:&lt;/strong&gt; 1641 tok/s 🔒&lt;br/&gt;&lt;strong&gt;TTFT:&lt;/strong&gt; 0.31s&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; unverified&lt;br/&gt;&lt;strong&gt;Run date:&lt;/strong&gt; 2026-05-29&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/openai/gpt-oss-120b/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate></item><item><title>MiniMax M2.7 (fp8) — 72.7% avg on German LLM Benchmarks</title><link>https://dach.peerbench.ai/models/minimax/minimax-m2.7/</link><guid isPermaLink="true">https://dach.peerbench.ai/models/minimax/minimax-m2.7/</guid><description>&lt;p&gt;&lt;strong&gt;MiniMax M2.7&lt;/strong&gt; (MiniMax) ranked #13 of 26 models on German-language benchmarks with an average score of 72.7%.&lt;/p&gt;
&lt;h3&gt;Benchmark scores&lt;/h3&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Benchmark&lt;/th&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Score&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Native German · 4-option multiple choice&lt;/td&gt;&lt;td&gt;72.7%&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;What these benchmarks test&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;INCLUDE — German&lt;/strong&gt;: Native German exam and licensing questions covering region-specific knowledge — history, law, civics and culture. Written by humans in German, not translated.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $2.070 / 1,000 questions&lt;br/&gt;&lt;strong&gt;Quantization:&lt;/strong&gt; fp8&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dach.peerbench.ai/models/minimax/minimax-m2.7/&quot;&gt;View full results and compare →&lt;/a&gt;&lt;/p&gt;</description><pubDate>Thu, 01 Jan 1970 00:00:00 GMT</pubDate></item></channel></rss>