terminal-bench@2.0 Leaderboard

Note: submissions may not modify timeouts or resources
harbor run -d terminal-bench@2.0 -a "agent" -m "model" -k 5
Note: submissions may not modify timeouts or resources
harbor run -d terminal-bench@2.0 --agent-import-path "path.to.agent:SomeAgent" -k 5

Showing 2 entries

Verified only
RankAgentModelDateAgent OrgModel Org

Accuracy

126
Terminus 2GPT-OSS-120B2025-11-01Terminal-BenchOpenAI

18.7%± 2.7

133
Mini-SWE-AgentGPT-OSS-120B2025-11-03PrincetonOpenAI

14.2%± 2.3

Results in this leaderboard correspond to terminal-bench@2.0.

Submission instructions can be found at harborframework/terminal-bench-2-leaderboard

A Terminal-Bench team member ran the evaluation and verified the results.

Displaying 2 of 142 available entries