terminal-bench@2.0 Leaderboard
harbor run -d terminal-bench@2.0 -a "agent" -m "model" -k 5harbor run -d terminal-bench@2.0 --agent-import-path "path.to.agent:SomeAgent" -k 5Showing 10 entries
Select agents
Claude Opus 4.6
Select organizations
| Rank | Agent | Model | Date | Agent Org | Model Org | Accuracy | |
|---|---|---|---|---|---|---|---|
1 | ForgeCode | Claude Opus 4.6 | 2026-03-12 | ForgeCode | Anthropic | 81.8%± 1.7 | |
7 | Capy | Claude Opus 4.6 | 2026-03-12 | Capy | Anthropic | 75.3%± 2.4 | |
10 | Terminus-KIRA | Claude Opus 4.6 | 2026-02-22 | KRAFTON AI | Anthropic | 74.7%± 2.6 | |
13 | TongAgents | Claude Opus 4.6 | 2026-02-22 | Bigai | Anthropic | 71.9%± 2.7 | |
14 | Junie CLI | Multiple | 2026-03-07 | JetBrains | Multiple | 71.0%± 2.9 | |
16 | Droid | Claude Opus 4.6 | 2026-02-05 | Factory | Anthropic | 69.9%± 2.5 | |
19 | Crux | Claude Opus 4.6 | 2026-02-23 | Roam | Anthropic | 66.9%± N/A | |
20 | Mux | Claude Opus 4.6 | 2026-02-13 | Coder | Anthropic | 66.5%± 2.5 | |
28 | Terminus 2 | Claude Opus 4.6 | 2026-02-06 | AfterQuery | Anthropic | 62.9%± 2.7 | |
39 | Claude Code | Claude Opus 4.6 | 2026-02-07 | Anthropic | Anthropic | 58.0%± 2.9 |
Results in this leaderboard correspond to terminal-bench@2.0.
Submission instructions can be found at harborframework/terminal-bench-2-leaderboard
A Terminal-Bench team member ran the evaluation and verified the results.
Displaying 10 of 122 available entries