Dual-Control Simulation Framework
Simulates conversational tasks where AI agents and users collaboratively influence outcomes in customer service domains such as retail, airline, and telecom.
Domain-Specific Policies and APIs
Includes detailed domain configurations, policies, and API documentation accessible locally for inspection and reproducibility.
Leaderboard Integration
Tracks and displays performance results of various AI models like GPT-4o and Claude 3.5 Sonnet across supported tasks and domains.
Task Execution and Evaluation
Supports running specific tasks by ID and evaluating historical interaction trajectories within the benchmark environment.
Python 3.10+ Compatibility
Implemented in Python with optional virtual environment support to ensure reproducibility and ease of setup.