Registry
The Terminal-Bench registry of tasks and datasets.
Terminal-Bench plans to periodically release new sets of tasks and versions of the existing tasks. Additionally, we are in the process of adapting many popular benchmarks to the Terminal-Bench framework.
The Terminal-Bench registry stores information about all of the tasks and datasets in the Terminal-Bench ecosystem and enables quick evaluation and caching of the datasets.
Registry
The registry is stored in the registry.json
file in the root of the Terminal-Bench repository.
A row in the registry contains the following information:
Evaluating on a dataset
Simply use the following options when running the CLI:
Viewing available datasets
Downloading a dataset
If you just want to download a dataset to a local directory:
Adding a new dataset
To add a new task to the terminal-bench-core dataset, please see our contributing guide.
To create a new adapter, please see our adapter guide.
To add a new dataset to the registry, please make a PR to the registry.json
file in the root of the Terminal-Bench repository. Additionally, for now we prefer storing all datasets in the laude-institute/terminal-bench-datasets
repo so please include your dataset in a PR there.
Using a custom registry
If you want to use the Terminal-Bench harness but have a private set of tasks, you can create a custom registry by creating a local registry.json
file and using the --registry-path
flag when running the CLI.
Alternatively, you can use the --registry-url
flag to point to a remote registry.
The registry URL should point to the raw text of a registry.json
file.
The default URL is https://raw.githubusercontent.com/laude-institute/terminal-bench/main/registry.json
.