1
Install Terminal-Bench
Run `uv tool install terminal-bench` or `pip install terminal-bench` to install the package.
2
Run Evaluations
Use the CLI commands `tb` or `tb run` to execute benchmark tasks and evaluate AI agents.
3
Configure Custom Docker Images
Set `use_prebuilt_image=false` in CLI commands or Python evaluation scripts to use custom Docker images.
4
View Leaderboard
Access the public leaderboard at https://www.tbench.ai/leaderboard/terminal-bench/2.0 to compare agent performance.
5
Contribute Tasks or Adapters
Follow documentation to add new tasks or adapters by placing files in the tasks folder and submitting a pull request.