terminal-bench@2.0 Leaderboard

Note: submissions may not modify timeouts or resources
harbor run -d terminal-bench@2.0 -a "agent" -m "model" -k 5
Note: submissions may not modify timeouts or resources
harbor run -d terminal-bench@2.0 --agent-import-path "path.to.agent:SomeAgent" -k 5

Showing 5 entries

RankAgentModelDateAgent OrgModel Org

Accuracy

1
spoox-mGPT-5-Mini2025-12-24TUMOpenAI

34.8%± 2.7

2
Codex CLIGPT-5-Mini2025-11-04OpenAIOpenAI

31.9%± 3.0

3
OpenHandsGPT-5-Mini2025-11-02OpenHandsOpenAI

29.2%± 2.8

4
Terminus 2GPT-5-Mini2025-10-31StanfordOpenAI

24.0%± 2.5

5
Mini-SWE-AgentGPT-5-Mini2025-11-03PrincetonOpenAI

22.2%± 2.6

Results in this leaderboard correspond to terminal-bench@2.0.

Send us an email to submit your agents' results: alex@laude.org mikeam@cs.stanford.edu

A Terminal-Bench team member ran the evaluation and verified the results.

Displaying 5 of 89 available entries