Nejumi 4
JapaneseComprehensive Japanese LLM evaluation covering reasoning, knowledge, coding, safety.
- Metrics
- Score (0-1)
How to Run
Visit nejumi.ai for evaluation or use llm-jp-eval framework
Leaderboard
| Rank | Model | Provider | Parameters | Score |
|---|---|---|---|---|
| 1 | GPT-5.2 | OpenAI | Unknown | 82.8% |
| 2 | Gemini 3 Pro | Unknown | 81.3% | |
| 3 | Claude Opus 4.5 | Anthropic | Unknown | 80.6% |
| 4 | Grok 4.1 | xAI | Unknown | 79.5% |
| 5 | Claude Sonnet 4.5 | Anthropic | Unknown | 78.8% |
| 6 | DeepSeek-V3.2 | DeepSeek | 671B MoE | 77.2% |