AI Engineer June 9, 2026

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind

Summary

The presentation focuses on recent advancements in AI audio by Google DeepMind, highlighting the Gemini API and Google AI Studio. Key developments include the release of Gemma 4 with multimodal capabilities and Gemini 3.1 Flash, a real-time, full-duplex conversational model. The practical takeaway is that these models offer sophisticated audio understanding beyond simple transcription, grasping context and emotion.

View original episode ↗

Mobile experience coming soon

From Transcription to Live Music: Gemini's Audio Stack — Thor Schaeff, Google DeepMind

Summary