terminal-bench

Registry

The Terminal-Bench registry of tasks and datasets.

Terminal-Bench plans to periodically release new sets of tasks and versions of the existing tasks. Additionally, we are in the process of adapting many popular benchmarks to the Terminal-Bench framework.

The Terminal-Bench registry stores information about all of the tasks and datasets in the Terminal-Bench ecosystem and enables quick evaluation and caching of the datasets.

Registry

The registry is stored in the registry.json file in the root of the Terminal-Bench repository.

A row in the registry contains the following information:

{
    "name": "<dataset-name>",
    "version": "<x.y.z>",
    "description": "<description>",
    "terminal_bench_version": "<version>",
    "github_url": "<url>",
    "dataset_path": "<path>",
    "branch": "<branch>",
    "commit_hash": "<hash>",
    "task_id_subset": [
        "<task-id-1>",
        "<task-id-2>",
        ...
    ]
}

Evaluating on a dataset

Simply use the following options when running the CLI:

tb run \
    --dataset-name <dataset-name> \
    --dataset-version <x.y.z> \
    ...

Viewing available datasets

tb datasets list

Downloading a dataset

If you just want to download a dataset to a local directory:

tb datasets download \
    --dataset-name <dataset-name> \
    --dataset-version <x.y.z> \
    --output-dir <path> # written to ~/.cache/terminal-bench if not specified

Adding a new dataset

To add a new task to the terminal-bench-core dataset, please see our contributing guide.

To create a new adapter, please see our adapter guide.

To add a new dataset to the registry, please make a PR to the registry.json file in the root of the Terminal-Bench repository. Additionally, for now we prefer storing all datasets in the laude-institute/terminal-bench-datasets repo so please include your dataset in a PR there.

Using a custom registry

If you want to use the Terminal-Bench harness but have a private set of tasks, you can create a custom registry by creating a local registry.json file and using the --registry-path flag when running the CLI.

tb run \
    --registry-path <path/to/registry.json> \
    ...

Alternatively, you can use the --registry-url flag to point to a remote registry.

tb run \
    --registry-url <url> \
    ...

The registry URL should point to the raw text of a registry.json file.

The default URL is https://raw.githubusercontent.com/laude-institute/terminal-bench/main/registry.json.

On this page