1
Access the Website
Visit https://www.swebench.com to explore available datasets and leaderboards.
2
Download a Dataset Subset
Start with SWE-bench Lite (300 instances) for initial evaluation to reduce compute requirements.
3
Set Up Evaluation Environment
Use the Harness API to configure Docker environments, run tests, and generate patches.
4
Submit Results
Submit your predictions.json file with model-generated patches to the leaderboard to obtain % Resolved scores.
5
Request Custom Support
Contact support@swebench.com for custom datasets or to contribute to the benchmark.