terminal-bench@2.0 Leaderboard
harbor run -d terminal-bench@2.0 -a "agent" -m "model" -k 5harbor run -d terminal-bench@2.0 --agent-import-path "path.to.agent:SomeAgent" -k 5Showing 1 entries
grok-cli
Select models
Select organizations
| Rank | Agent | Model | Date | Agent Org | Model Org | Accuracy | |
|---|---|---|---|---|---|---|---|
42 | grok-cli | Grok 4.20 Reasoning | 2026-04-02 | Vibe Kit | xAI | 57.3%± N/A |
Results in this leaderboard correspond to terminal-bench@2.0.
Submission instructions can be found at harborframework/terminal-bench-2-leaderboard
A Terminal-Bench team member ran the evaluation and verified the results.
Displaying 1 of 123 available entries