Authentication Failures (A07) show the largest gap in the dataset: a 48-percentage-point difference between leaders and the field. Leaders fix at nearly 60%, while the field sits at roughly 12%.
OpenAI's GPT-5.4 Pro has solved an open math problem unsolved since 2019, with Epoch AI independently verifying the first AI ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results