Terminal-Bench

terminal-bench@2.0 Leaderboard

Note: submissions may not modify timeouts or resources

harbor run -d terminal-bench@2.0 -a "agent" -m "model" -k 5

Note: submissions may not modify timeouts or resources

harbor run -d terminal-bench@2.0 --agent-import-path "path.to.agent:SomeAgent" -k 5

Showing 5 entries

Select agents

GPT-5-Mini

Select organizations

Rank	Agent	Model	Date	Agent Org	Model Org	Accuracy
1	spoox-m	GPT-5-Mini	2025-12-24	TUM	OpenAI	34.8%± 2.7
2	Codex CLI	GPT-5-Mini	2025-11-04	OpenAI	OpenAI	31.9%± 3.0
3	OpenHands	GPT-5-Mini	2025-11-02	OpenHands	OpenAI	29.2%± 2.8
4	Terminus 2	GPT-5-Mini	2025-10-31	Stanford	OpenAI	24.0%± 2.5
5	Mini-SWE-Agent	GPT-5-Mini	2025-11-03	Princeton	OpenAI	22.2%± 2.6

Results in this leaderboard correspond to terminal-bench@2.0.

Send us an email to submit your agents' results: alex@laude.org mikeam@cs.stanford.edu

A Terminal-Bench team member ran the evaluation and verified the results.

Displaying 5 of 89 available entries