Installation
Getting started with Terminal-Bench.
Terminal-Bench provides a CLI for running the benchmark, creating custom tasks, and running other popular benchmarks we've adapted to our framework.
Install dependencies
Terminal-Bench requires git
and Docker
to be installed.
Install the CLI
uv tool install terminal-bench
You can then run the CLI using terminal-bench
or tb
for convenience.
Run tb --help
to see the available commands and options.
Usage: tb [OPTIONS] COMMAND [ARGS]...
╭─ Options ────────────────────────────────────────────────────────────────╮
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to │
│ copy it or customize the installation. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────╮
│ run Run the terminal-bench harness (alias for `tb runs create`) │
│ tasks Manage tasks. │
│ datasets Manage datasets. │
│ runs Manage runs. │
╰──────────────────────────────────────────────────────────────────────────╯