AI Engineer June 27, 2025

Engineering Better Evals: Scalable LLM Evaluation Pipelines That Work — Dat Ngo, Aman Khan, Arize

Summary

This talk focuses on LLM evaluation pipelines, emphasizing practical challenges and solutions faced by AI engineers. Key concepts include observability and evals, with examples from companies like Duolingo demonstrating the significant scale of evaluation required for production AI. The practical takeaway is the necessity of robust eval strategies to understand and optimize AI model performance.

View original episode ↗

Mobile experience coming soon

Engineering Better Evals: Scalable LLM Evaluation Pipelines That Work — Dat Ngo, Aman Khan, Arize

Summary