Alternatives

Other options to consider

Inspect Evals Provides a terminal_bench_2 implementation with ReAct agents and Docker support, differing in prompting and evaluation framework.
Harbor Companion package to Terminal-Bench 2.0 offering extended harness for cloud container deployment and agent optimization.