AI Engineer May 25, 2026

Agentic Evaluations at Scale, For Everybody — Nicholas Kang & Michael Aaron, Google DeepMind

Summary

The presentation focuses on challenges in AI evaluations and benchmarking, specifically highlighting the current fragmented and rapidly outdated state of AI performance assessment. Nick and Michael from Kaggle, representing the world's largest AI/ML community, discuss how current evaluation methods are decentralized, with numerous benchmarks emerging daily that are difficult to track and compare. Their goal is to address these issues by developing more systematic and scalable approaches to AI evaluation, inviting collaboration from the tech community to solve this critical problem in the rapidly evolving AI landscape.

View original episode ↗

Mobile experience coming soon

Agentic Evaluations at Scale, For Everybody — Nicholas Kang & Michael Aaron, Google DeepMind

Summary