Optimizing inference for voice models in production - Philip Kiely, Baseten
Summary
This talk focuses on optimizing inference for voice models in production, drawing parallels between Text-to-Speech (TTS) models and Large Language Models (LLMs). By leveraging the LLM ecosystem, specifically referencing the Orpheus TTS model built on a Llama backbone, practitioners can improve performance. The practical takeaway is to adopt LLM tooling to enhance TTS model efficiency.