How are AI leaderboard categories calculated?

The page groups benchmark columns into coding, math, and reasoning categories, then calculates category averages and a global average for each model.

Can users search AI models on the leaderboard?

Yes. The page includes a search field that filters the leaderboard by model name after the benchmark data loads.

AI Leaderboard for LLM Rankings

Q: What does the AI Leaderboard compare?

The AI Leaderboard compares large language models across overall performance, coding, math, and reasoning categories.

Last updated: Aug 1, 2026 | Data source: LiveBench benchmark data

The AI Leaderboard compares large language models across overall performance, coding, math, and reasoning categories, then displays searchable model rankings and category winners in a browser-based table.

The Intelligence Arena uses benchmark columns from LiveBench and groups them into practical comparison categories. Use the tabs to sort by overall score, coding, math, or reasoning, then search by model name to compare leading AI systems quickly.

How does the AI leaderboard calculate rankings?

The page parses benchmark CSV data, averages related benchmark columns into category scores, calculates a global average, and sorts models by the active category.

Category	Included benchmark areas	What it helps compare
Overall	All numeric benchmark columns available in the dataset.	Broad cross-domain model strength across the full table.
Coding	Code completion, code generation, JavaScript, Python, and TypeScript scores.	Software engineering, syntax, implementation, and programming task performance.
Math	Math competition, AMPS Hard, and Olympiad-style scores.	Advanced quantitative reasoning and mathematical problem solving.
Reasoning	Zebra puzzle, spatial, connections, theory of mind, and plot unscrambling scores.	Logical inference, pattern solving, spatial understanding, and narrative reasoning.

Connecting to LiveBench via proxy...

Connecting...

AI Leaderboard FAQ

What does the AI Leaderboard compare?

It compares large language models across overall performance, coding, math, and reasoning categories.

How are category scores calculated?

Related benchmark columns are grouped into categories, averaged, and used to sort the model table.

Can I search by model name?

Yes. The search field filters the loaded leaderboard by model name after the benchmark CSV loads.