terminal-bench

  • Docs
  • Leaderboard
  • Tasks
  • Registry
  • Contributors
  • News
  • Terminus
  • Discord

Dataset Registry

tb datasets list

terminal-bench-core

head

The latest pre-release version of terminal-bench core. Hand crafted by undergraduate, graduate, and industry researchers.

105 tasks

terminal-bench-core

0.1.0

The terminal-bench task set used in our launch. Hand crafted by undergraduate, graduate, and industry researchers.

80 tasks

terminal-bench-core

0.1.1leaderboard

The terminal-bench task set used in our launch. Hand crafted by undergraduate, graduate, and industry researchers. Recently patched to support terminal-bench 0.2.4+.

80 tasks

deveval

head

Adapter for DevEval(https://github.com/open-compass/DevEval). Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/1

63 tasks

evoeval-difficult

head

Adapter for EvoEval(https://github.com/evo-eval/evoeval). Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/5

100 tasks

appworld-dev

headencrypted

Adapter for AppWorld (https://github.com/StonyBrookNLP/appworld). Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/3

57 tasks

swebench-verified

head

Adapter for SWEBench (https://github.com/SWE-bench/SWE-bench). Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/2

500 tasks

Learn more about the Terminal-Bench registry here.