Agentic Evaluations at Scale, For Everybody — Nicholas Kang & Michael Aaron, Google DeepMind
Summary
The presentation focuses on challenges in AI evaluations and benchmarking, specifically highlighting the current fragmented and rapidly outdated state of AI performance assessment. Nick and Michael from Kaggle, representing the world's largest AI/ML community, discuss how current evaluation methods are decentralized, with numerous benchmarks emerging daily that are difficult to track and compare. Their goal is to address these issues by developing more systematic and scalable approaches to AI evaluation, inviting collaboration from the tech community to solve this critical problem in the rapidly evolving AI landscape.