Competitor Comparison Matrix

Feature | GPQA Diamond | MMLU | BIG-bench | ARC
---------------------------------|--------------|------|-----------|-----
Multi-step Reasoning Tasks | ✓ | ✗ | ✓ | ✓
Open & Extensible Framework | ✓ | ✗ | ✓ | ✗
Failure Mode Analysis | ✓ | ✗ | ✗ | ✗
Cross-Domain Evaluation | ✓ | ✓ | ✓ | ✓
Supports Multiple LLM Architectures | ✓ | ✓ | ✓ | ✗
Free Tier Available | ✓ | ✓ | ✓ | ✓
Slide 10 of 12