SWE-bench Verified
CodingReal GitHub issues from popular repositories. Gold standard for coding.
- Metrics
- Resolved (%)
How to Run
git clone https://github.com/princeton-nlp/SWE-bench && cd SWE-bench && pip install -e . && python run_evaluation.py
Leaderboard
| Rank | Model | Provider | Parameters | Score |
|---|---|---|---|---|
| 1 | Claude Opus 4.5 | Anthropic | Unknown | 80.9% |
| 2 | GPT-5.2 | OpenAI | Unknown | 80.0% |
| 3 | Gemini 3 Pro | Unknown | 76.2% | |
| 4 | DeepSeek V3 | DeepSeek | Unknown | 67.1% |