From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind
Summary
The presentation focuses on recent advancements in AI audio by Google DeepMind, highlighting the Gemini API and Google AI Studio. Key developments include the release of Gemma 4 with multimodal capabilities and Gemini 3.1 Flash, a real-time, full-duplex conversational model. The practical takeaway is that these models offer sophisticated audio understanding beyond simple transcription, grasping context and emotion.