terminal-bench@2.0 Leaderboard
harbor run -d terminal-bench/terminal-bench-2 -a "agent" -m "model" -k 5harbor run -d terminal-bench/terminal-bench-2 --agent-import-path "path.to.agent:SomeAgent" -k 5Showing 11 entries
Select agents
GPT-5.3-Codex
Select organizations
Verified only
| Rank | Agent | Model | Date | Agent Org | Model Org | Accuracy | |
|---|---|---|---|---|---|---|---|
2 | LemonHarness | Multiple | 2026-05-14 | LR AILab of Lenovo CTO Org | Multiple | 84.5%± 2.6 | |
9 | SageAgent | GPT-5.3-Codex | 2026-03-13 | OpenSage | OpenAI | 78.4%± 2.2 | |
10 | Droid | GPT-5.3-Codex | 2026-02-24 | Factory | OpenAI | 77.3%± 2.2 | |
12 | CodeBrain-1.5 | GPT-5.3-Codex | 2026-02-10 | Feeling AI | OpenAI | 75.8%± 2.0 | |
13 | Codelia | GPT-5.3-Codex | 2026-05-14 | kousw | OpenAI | 75.7%± 2.2 | |
15 | Simple Codex | GPT-5.3-Codex | 2026-02-06 | OpenAI | OpenAI | 75.1%± 2.4 | |
18 | Mux | GPT-5.3-Codex | 2026-03-06 | Coder | OpenAI | 74.6%± 2.5 | |
21 | spoox-o-m | GPT-5.3-Codex | 2026-05-15 | TUM | OpenAI | 71.5%± 2.5 | |
22 | Junie CLI | Multiple | 2026-03-07 | JetBrains | Multiple | 71.0%± 2.9 | |
25 | IndusAGI Coding Agent | GPT-5.3-Codex | 2026-03-18 | Varun Israni (SoloVpx) | OpenAI | 69.1%± 2.3 | |
32 | Terminus 2 | GPT-5.3-Codex | 2026-02-05 | Terminal-Bench | OpenAI | 64.7%± 2.7 |
Results in this leaderboard correspond to terminal-bench@2.0.
Submission instructions can be found at harborframework/terminal-bench-2-leaderboard
A Terminal-Bench team member ran the evaluation and verified the results.
Displaying 11 of 142 available entries