Use Cases - terminal-bench-20

Developers and researchers can benchmark AI agents on terminal-based tasks such as code compilation, server setup, and vulnerability fixing.

Users can extend the benchmark by adding new tasks or modifying existing ones to suit specific evaluation needs.