terminal-bench
run terminal-bench
leaderboard
benchmarks
contributors
news
discord
News
Latest updates and announcements from the Terminal-Bench team.
Sun Mar 08 2026
News
Terminal-Bench-Science: Now in Development
Extending Terminal-Bench to complex scientific workflow tasks in the natural sciences.
Thu Mar 05 2026
News
Terminal-Bench 3.0 Call for Contributions
Collaborate on creating a new frontier of challenging computer based tasks.
Fri Nov 07 2025
Release
Introducing Terminal-Bench 2.0 and Harbor
A harder, better verified version of Terminal-Bench and a new package evaluating and optimizing agents.
Tue Sep 09 2025
News
Leaderboard Integrity and Timeouts
A clarification on our leaderboard time constraints.
Tue Jul 15 2025
Release
Introducing the Terminal-Bench Dataset Registry with SWE-Bench Verified, AppWorld, DevEval, and EvoEval
An easy way to evaluate agents on popular benchmarks and distribute new benchmarks to agent developers.
Wed Jun 25 2025
News
Warp scores a new SOTA on Terminal-bench
Warp debuts their terminal agent at #1 on Terminal-bench, resolving 52% of tasks
Fri Jun 20 2025
News
Task Spotlight June 20th 2025: Scientific Computing and Cryptography
Highlighting two new tasks for Terminal-Bench
Fri May 23 2025
News
Terminal-Bench on the Claude 4 Model Card
Anthropic features Terminal-Bench in their latest release and sets a new SOTA.
Mon May 19 2025
Release
Introducing Terminal-Bench
An evaluation framework and benchmark to quantify agents' ability to complete complex tasks in the terminal.
Mon May 19 2025
Release
Terminus
A research-preview agent for consistently evaluating the abilities of language models to power autonomous agents in the terminal.