Getting Started

How to get started with S. Bench Pro

1

Install Docker

Install Docker on your system and complete any post-installation steps required for your OS, such as Linux.

2

Store Modal Credentials

Use the provided commands to securely store your Modal credentials needed for scaled evaluation.

3

Clone GitHub Repository

Clone the public SWE-Bench Pro GitHub repository to access benchmark code and resources.

4

Access Docker Images

Pull prebuilt Docker images for each task instance from hub.docker.com/r/jefzda/sweap-images.

5

Run Evaluation Scripts

Execute evaluation scripts on the public dataset via Hugging Face or the official leaderboard to benchmark your AI agent.

6

View Leaderboards

Monitor live leaderboards to compare model performance on public and commercial tasks.