Building Effective Voice Agents — Toki Sherbakov + Anoop Kotha, OpenAI
Summary
The talk focuses on building practical audio agents, transitioning from text-based AI to the emerging multimodal era of images and audio. Key references include advancements in speech-to-speech capabilities, making audio agents faster, more expressive, and accurate for production-level applications. The practical takeaway is that audio models have reached a tipping point where high-quality, scalable applications can now be built.