Building Generative Image & Video models at Scale - Sander Dieleman, Google DeepMind
Summary
Sander, a research scientist at Google DeepMind, provides an in-depth exploration of diffusion models for generative media, focusing on the technical intricacies of training audio-visual AI models at scale. The talk covers key topics including data curation, model representation, neural network architecture, training methodologies, and advanced sampling techniques for generative AI. By breaking down the complex process of creating sophisticated generative models like VEO and NanoBanana, Sander offers a comprehensive behind-the-scenes perspective on the cutting-edge technologies driving modern AI image and audio generation.