Key Features

What you can do

Dual-Control Simulation Framework

Simulates conversational tasks where AI agents and users collaboratively influence outcomes in customer service domains such as retail, airline, and telecom.

Domain-Specific Policies and APIs

Includes detailed domain configurations, policies, and API documentation accessible locally for inspection and reproducibility.

Leaderboard Integration

Tracks and displays performance results of various AI models like GPT-4o and Claude 3.5 Sonnet across supported tasks and domains.

Task Execution and Evaluation

Supports running specific tasks by ID and evaluating historical interaction trajectories within the benchmark environment.

Python 3.10+ Compatibility

Implemented in Python with optional virtual environment support to ensure reproducibility and ease of setup.