terminal-bench

  • Docs
  • Leaderboard
  • Tasks
  • Registry
  • Contributors
  • News
  • Terminus
  • Discord

News

Latest updates and announcements from the Terminal-Bench team.

Tue Jul 15 2025Release
Introducing the Terminal-Bench Dataset Registry with SWE-Bench Verified, AppWorld, DevEval, and EvoEval
An easy way to evaluate agents on popular benchmarks and distribute new benchmarks to agent developers.
Wed Jun 25 2025News
Warp scores a new SOTA on Terminal-bench
Warp debuts their terminal agent at #1 on Terminal-bench, resolving 52% of tasks
Fri Jun 20 2025News
Task Spotlight June 20th 2025: Scientific Computing and Cryptography
Highlighting two new tasks for Terminal-Bench
Fri May 23 2025News
Terminal-Bench on the Claude 4 Model Card
Anthropic features Terminal-Bench in their latest release and sets a new SOTA.
Mon May 19 2025Release
Introducing Terminal-Bench
An evaluation framework and benchmark to quantify agents' ability to complete complex tasks in the terminal.