AI Engineer July 1, 2025

Optimizing inference for voice models in production - Philip Kiely, Baseten

Summary

This talk focuses on optimizing inference for voice models in production, drawing parallels between Text-to-Speech (TTS) models and Large Language Models (LLMs). By leveraging the LLM ecosystem, specifically referencing the Orpheus TTS model built on a Llama backbone, practitioners can improve performance. The practical takeaway is to adopt LLM tooling to enhance TTS model efficiency.

View original episode ↗

Mobile experience coming soon

Optimizing inference for voice models in production - Philip Kiely, Baseten

Summary