terminal-bench@2.0 Leaderboard
harbor run -d terminal-bench/terminal-bench-2 -a "agent" -m "model" -k 5harbor run -d terminal-bench/terminal-bench-2 --agent-import-path "path.to.agent:SomeAgent" -k 5Showing 7 entries
Select agents
GPT-5-Nano
Select organizations
Verified only
| Rank | Agent | Model | Date | Agent Org | Model Org | Accuracy | |
|---|---|---|---|---|---|---|---|
47 | Warp | Multiple | 2025-11-20 | Warp | Multiple | 59.1%± 2.8 | |
65 | Warp | Multiple | 2025-11-11 | Warp | Multiple | 50.1%± 2.7 | |
123 | spoox-o-m | GPT-5-Nano | 2026-05-15 | TUM | OpenAI | 21.8%± 2.8 | |
136 | Codex CLI | GPT-5-Nano | 2025-11-04 | OpenAI | OpenAI | 11.5%± 2.3 | |
137 | OpenHands | GPT-5-Nano | 2025-11-02 | OpenHands | OpenAI | 9.9%± 2.1 | |
139 | Terminus 2 | GPT-5-Nano | 2025-10-31 | Terminal-Bench | OpenAI | 7.9%± 1.9 | |
140 | Mini-SWE-Agent | GPT-5-Nano | 2025-11-03 | Princeton | OpenAI | 7.0%± 1.9 |
Results in this leaderboard correspond to terminal-bench@2.0.
Submission instructions can be found at harborframework/terminal-bench-2-leaderboard
A Terminal-Bench team member ran the evaluation and verified the results.
Displaying 7 of 142 available entries