terminal-bench@2.0 Leaderboard
harbor run -d terminal-bench@2.0 -a "agent" -m "model" -k 5harbor run -d terminal-bench@2.0 --agent-import-path "path.to.agent:SomeAgent" -k 5Showing 2 entries
Select agents
GPT-OSS-120B
Select organizations
Verified only
| Rank | Agent | Model | Date | Agent Org | Model Org | Accuracy | |
|---|---|---|---|---|---|---|---|
126 | Terminus 2 | GPT-OSS-120B | 2025-11-01 | Terminal-Bench | OpenAI | 18.7%± 2.7 | |
133 | Mini-SWE-Agent | GPT-OSS-120B | 2025-11-03 | Princeton | OpenAI | 14.2%± 2.3 |
Results in this leaderboard correspond to terminal-bench@2.0.
Submission instructions can be found at harborframework/terminal-bench-2-leaderboard
A Terminal-Bench team member ran the evaluation and verified the results.
Displaying 2 of 142 available entries