terminal-bench
tb datasets list
The latest pre-release version of terminal-bench core. Hand crafted by undergraduate, graduate, and industry researchers.
105 tasks
The terminal-bench task set used in our launch. Hand crafted by undergraduate, graduate, and industry researchers.
80 tasks
The terminal-bench task set used in our launch. Hand crafted by undergraduate, graduate, and industry researchers. Recently patched to support terminal-bench 0.2.4+.
Adapter for DevEval(https://github.com/open-compass/DevEval). Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/1
63 tasks
Adapter for EvoEval(https://github.com/evo-eval/evoeval). Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/5
100 tasks
Adapter for AppWorld (https://github.com/StonyBrookNLP/appworld). Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/3
57 tasks
Adapter for SWEBench (https://github.com/SWE-bench/SWE-bench). Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/2
500 tasks
Learn more about the Terminal-Bench registry here.