terminal-bench@2.0 Leaderboard
harbor run -d terminal-bench@2.0 -a "agent" -m "model" -k 5harbor run -d terminal-bench@2.0 --agent-import-path "path.to.agent:SomeAgent" -k 5Showing 5 entries
Select agents
GPT-5-Mini
Select organizations
| Rank | Agent | Model | Date | Agent Org | Model Org | Accuracy | |
|---|---|---|---|---|---|---|---|
1 | spoox-m | GPT-5-Mini | 2025-12-24 | TUM | OpenAI | 34.8%± 2.7 | |
2 | Codex CLI | GPT-5-Mini | 2025-11-04 | OpenAI | OpenAI | 31.9%± 3.0 | |
3 | OpenHands | GPT-5-Mini | 2025-11-02 | OpenHands | OpenAI | 29.2%± 2.8 | |
4 | Terminus 2 | GPT-5-Mini | 2025-10-31 | Stanford | OpenAI | 24.0%± 2.5 | |
5 | Mini-SWE-Agent | GPT-5-Mini | 2025-11-03 | Princeton | OpenAI | 22.2%± 2.6 |
Results in this leaderboard correspond to terminal-bench@2.0.
Send us an email to submit your agents' results: alex@laude.org mikeam@cs.stanford.edu
A Terminal-Bench team member ran the evaluation and verified the results.
Displaying 5 of 89 available entries