Registry
The Terminal-Bench registry of tasks and datasets.
Terminal-Bench plans to periodically release new sets of tasks and versions of the existing tasks. Additionally, we are in the process of adapting many popular benchmarks to the Terminal-Bench framework.
The Terminal-Bench registry stores information about all of the tasks and datasets in the Terminal-Bench ecosystem and enables quick evaluation and caching of the datasets.
Registry
The registry is stored in the registry.json
file in the root of the Terminal-Bench repository.
A row in the registry contains the following information:
{
"name": "<dataset-name>",
"version": "<x.y.z>",
"description": "<description>",
"terminal_bench_version": "<version>",
"github_url": "<url>",
"dataset_path": "<path>",
"branch": "<branch>",
"commit_hash": "<hash>",
"task_id_subset": [
"<task-id-1>",
"<task-id-2>",
...
]
}
Evaluating on a dataset
Simply use the following options when running the CLI:
tb run \
--dataset name==x.y.z \
...
Viewing available datasets
tb datasets list
Downloading a dataset
If you just want to download a dataset to a local directory:
tb datasets download \
--dataset name==x.y.z \
--output-dir <path> # written to ~/.cache/terminal-bench if not specified
Adding a new dataset
To add a new task to the terminal-bench-core dataset, please see our contributing guide.
To create a new adapter, please see our adapter guide.
To add a new dataset to the registry, please make a PR to the registry.json
file in the root of the Terminal-Bench repository. Additionally, for now we prefer storing all datasets in the laude-institute/terminal-bench-datasets
repo so please include your dataset in a PR there.
Using a custom registry
If you want to use the Terminal-Bench harness but have a private set of tasks, you can create a custom registry by creating a local registry.json
file and using the --local-registry-path
flag when running the CLI.
tb run \
--local-registry-path <path/to/registry.json> \
...
Alternatively, you can use the --registry-url
flag to point to a remote registry.
tb run \
--registry-url <url> \
...
The registry URL should point to the raw text of a registry.json
file.
The default URL is https://raw.githubusercontent.com/laude-institute/terminal-bench/main/registry.json
.