Getting Started - s-bench-pro

1

Install Docker on your system and complete any post-installation steps required for your OS, such as Linux.

2

Use the provided commands to securely store your Modal credentials needed for scaled evaluation.

3

Clone the public SWE-Bench Pro GitHub repository to access benchmark code and resources.

4

Pull prebuilt Docker images for each task instance from hub.docker.com/r/jefzda/sweap-images.

5

Execute evaluation scripts on the public dataset via Hugging Face or the official leaderboard to benchmark your AI agent.

6

Monitor live leaderboards to compare model performance on public and commercial tasks.