2025 in LLMs so far, illustrated by Pelicans on Bicycles — Simon Willison
Summary
The talk reviews the rapid acceleration of LLMs over the past six months, highlighting over 30 significant model releases. Traditional benchmarks are losing credibility, leading to the presenter's reliance on a unique, practical test: generating an SVG of a pelican riding a bicycle. The takeaway is that while benchmarks provide numbers, real-world, complex tasks reveal model capabilities and limitations more effectively.