Live Codebench
LiveCodeBench is an open-source benchmark designed to evaluate large language models (LLMs) on coding tasks derived from competitive programming contests. It continuously collects problems from platforms such as LeetCode, AtCoder, and CodeForces, ensuring that the problems used for evaluation are released after the model's training cutoff date to prevent data contamination. The benchmark includes over 1,000 problems spanning easy to hard difficulty levels as of its latest release (v6). LiveCodeBench assesses multiple aspects of coding capabilities including code generation, self-repair, code execution, and test output prediction, using execution-based accuracy metrics with hidden test cases for functional correctness.
LiveCodeBench provides contamination-free, time-annotated evaluation of LLMs on competitive programming problems across multiple coding scenarios.
Benchmarking LLM Coding Performance
Researchers and developers can evaluate the coding abilities of large language models on recent competitive programming problems that the models have not seen during training.
Testing Code Generation and Repair
Use LiveCodeBench to assess not only code generation but also the model's ability to self-repair code and predict test outputs.
git clone https://github.com/LiveCodeBench/LiveCodeBench.git and navigate into the directory with cd LiveCodeBench.uv command to verify installation.release_v6 which contains 1055 problems.