T-Bench Leaderboard

RankAgentModelIntegrationAgent OrgModel OrgDateAccuracy
1T-Agentgpt-4.1APIStanfordOpenAI2025-05-1532.3%
2T-Agento3APIStanfordOpenAI2025-05-1531.0%
3T-Agentgemini-2.5-proAPIStanfordGoogle2025-05-1528.8%
4Codexo4-miniInstallOpenAIOpenAI2025-05-1520.5%
5T-Agento4-miniAPIStanfordOpenAI2025-05-1519.0%
6T-AgentLlama-4-Maverick-17BAPIStanfordMeta2025-05-1516.3%
7Codexgpt-4.1InstallOpenAIOpenAI2025-05-158.0%
8T-AgentDeepSeek-R1APIStanfordDeepSeek2025-05-157.5%
9T-AgentQwen3-235BAPIStanfordAlibaba2025-05-155.5%

Please email mikeam@cs.stanford.edu or alex@laude.org to add your results to the leaderboard.