terminal-bench@2.0 Leaderboard
harbor run -d terminal-bench@2.0 -a "agent" -m "model" -k 5harbor run -d terminal-bench@2.0 --agent-import-path "path.to.agent:SomeAgent" -k 5Showing 3 entries
Select agents
Claude Opus 4.7
Select organizations
Verified only
| Rank | Agent | Model | Date | Agent Org | Model Org | Accuracy | |
|---|---|---|---|---|---|---|---|
1 | vix | Claude Opus 4.7 | 2026-05-15 | vix | Anthropic | 90.2%± 2.1 | |
6 | Polaris | Multiple | 2026-05-14 | PolarisOps | Multiple | 82.2%± 2.8 | |
9 | WOZCODE | Claude Opus 4.7 | 2026-05-14 | WOZCODE | Anthropic | 80.2%± 2.1 |
Results in this leaderboard correspond to terminal-bench@2.0.
Submission instructions can be found at harborframework/terminal-bench-2-leaderboard
A Terminal-Bench team member ran the evaluation and verified the results.
Displaying 3 of 143 available entries