AI Engineer April 24, 2026

What Do Models Still Suck At? - Peter Gostev, Arena.ai, BullshitBench

Summary

The transcript discusses the limitations of AI models, challenging the widespread perception of their near-omniscience by exploring what these models struggle with. The speaker introduces a benchmark test involving nonsensical questions to reveal how different AI models respond to unclear or irrational inputs, highlighting potential gaps in their reasoning capabilities. The analysis, conducted by tracking 700 models at Arena, suggests that while technological progress appears impressive on performance charts, there are significant nuanced challenges that remain unaddressed in current AI development. The key takeaway is that we should approach AI advancement with cautious skepticism, recognizing that impressive benchmark improvements do not necessarily equate to true understanding or general intelligence.

View original episode ↗

Mobile experience coming soon

What Do Models Still Suck At? - Peter Gostev, Arena.ai, BullshitBench

Summary