Key Features - tao-squared-bench

✨

Dual-Control Simulation Framework

Simulates conversational tasks where AI agents and users collaboratively influence outcomes in customer service domains such as retail, airline, and telecom.

✨

Domain-Specific Policies and APIs

Includes detailed domain configurations, policies, and API documentation accessible locally for inspection and reproducibility.

✨

Leaderboard Integration

Tracks and displays performance results of various AI models like GPT-4o and Claude 3.5 Sonnet across supported tasks and domains.

✨

Task Execution and Evaluation

Supports running specific tasks by ID and evaluating historical interaction trajectories within the benchmark environment.

✨

Python 3.10+ Compatibility

Implemented in Python with optional virtual environment support to ensure reproducibility and ease of setup.