Inspect Evals
Provides a terminal_bench_2 implementation with ReAct agents and Docker support, differing in prompting and evaluation framework.
Harbor
Companion package to Terminal-Bench 2.0 offering extended harness for cloud container deployment and agent optimization.