Introduction
Overview of the Terminal-Bench agent interface.
Creating a custom agent
Terminal-Bench tasks are complex environments with pre-configured lifecycles, include custom startup and teardown instructions. The Terminal-Bench harness handles this complexity for you, so you can focus on writing your agent.
If you haven't already, make sure to install Terminal-Bench by following our installation guide.
Integration Approaches
There are three approaches to integrating your own agent into Terminal-Bench:
- Install and run in the task container
- Implement the
BaseAgent
interface - Implement the
MCPAgent
interface
Install and run in the task container
The most straightforward approach is to install your agent as a package in the task container.
We've provided an AbstractInstalledAgent
class that you can implement to install and run your agent in the task container. Implementing the interface involves defining an installation script, the execution command, and environment variables.
Implement the BaseAgent
interface
Implementing the BaseAgent
interface is more robust than installing your agent in the task container, but, depending on your agent's implementation, can require a bit of surgery to implement.
The BaseAgent
interface only requires you to implement one method:
To integrate your agent by implementing the BaseAgent
interface, you'll need to
Want help?
We are also willing to pair-program with you to integrate your agent. Email mikeam@cs.stanford.edu or alex@laude.org (or both!) if you'd like help.
Implement the MCPAgent
interface
Coming soon!
Running a custom agent
Once you have implemented your custom agent, you can run it by using the --agent-import-path
flag when running the harness.