1
Install Docker
Install Docker on your system and complete any post-installation steps required for your OS, such as Linux.
2
Store Modal Credentials
Use the provided commands to securely store your Modal credentials needed for scaled evaluation.
3
Clone GitHub Repository
Clone the public SWE-Bench Pro GitHub repository to access benchmark code and resources.
4
Access Docker Images
Pull prebuilt Docker images for each task instance from hub.docker.com/r/jefzda/sweap-images.
5
Run Evaluation Scripts
Execute evaluation scripts on the public dataset via Hugging Face or the official leaderboard to benchmark your AI agent.
6
View Leaderboards
Monitor live leaderboards to compare model performance on public and commercial tasks.