Key Features - live-codebench

✨

Automatically gathers new coding problems from live contests on LeetCode, AtCoder, and CodeForces to maintain an up-to-date benchmark.

✨

Annotates problems with release dates to enable evaluation on data unseen during model training, supporting contamination-free benchmarking.

✨

Supports code generation, self-repair, code execution, and test output prediction to comprehensively assess LLM coding capabilities.

✨

Uses hidden test cases to measure functional correctness of generated code through actual code execution.

✨

Provides a reproducible evaluation framework and a leaderboard to compare LLM performance across difficulty levels.