Engineering Better Evals: Scalable LLM Evaluation Pipelines That Work — Dat Ngo, Aman Khan, Arize
Summary
This talk focuses on LLM evaluation pipelines, emphasizing practical challenges and solutions faced by AI engineers. Key concepts include observability and evals, with examples from companies like Duolingo demonstrating the significant scale of evaluation required for production AI. The practical takeaway is the necessity of robust eval strategies to understand and optimize AI model performance.