terminal-bench
Docs
Leaderboard
Tasks
Registry
Contributors
News
Terminus
Discord
News
Latest updates and announcements from the Terminal-Bench team.
Tue Jul 15 2025
Release
Introducing the Terminal-Bench Dataset Registry with SWE-Bench Verified, AppWorld, DevEval, and EvoEval
An easy way to evaluate agents on popular benchmarks and distribute new benchmarks to agent developers.
Wed Jun 25 2025
News
Warp scores a new SOTA on Terminal-bench
Warp debuts their terminal agent at #1 on Terminal-bench, resolving 52% of tasks
Fri Jun 20 2025
News
Task Spotlight June 20th 2025: Scientific Computing and Cryptography
Highlighting two new tasks for Terminal-Bench
Fri May 23 2025
News
Terminal-Bench on the Claude 4 Model Card
Anthropic features Terminal-Bench in their latest release and sets a new SOTA.
Mon May 19 2025
Release
Introducing Terminal-Bench
An evaluation framework and benchmark to quantify agents' ability to complete complex tasks in the terminal.