terminal-bench

Installation

Getting started with Terminal-Bench.

Terminal-Bench provides a CLI for running the benchmark, creating custom tasks, and running other popular benchmarks we've adapted to our framework.

Install dependencies

Terminal-Bench requires git and Docker to be installed.

Install the CLI

uv tool install terminal-bench

You can then run the CLI using terminal-bench or tb for convenience.

Run tb --help to see the available commands and options.

 Usage: tb [OPTIONS] COMMAND [ARGS]...                                      
                                                                            
╭─ Options ────────────────────────────────────────────────────────────────╮
│ --install-completion          Install completion for the current shell.  │
│ --show-completion             Show completion for the current shell, to  │
│                               copy it or customize the installation.     │
│ --help                        Show this message and exit.                │
╰──────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────╮
│ run        Run the terminal-bench harness (alias for `tb runs create`)   │
│ tasks      Manage tasks.                                                 │
│ datasets   Manage datasets.                                              │
│ runs       Manage runs.                                                  │
╰──────────────────────────────────────────────────────────────────────────╯

On this page