🚧 📱

Mobile experience coming soon

Mobile development is in progress. Until it is complete, please use your desktop or laptop.

Thanks!

← Back
AI Engineer June 6, 2026

Evals Are Broken, Use Them Anyway — Ara Khan, Cline

Summary

This talk critiques the common, often misleading, practices of evaluating AI models. Despite the flaws in objective benchmarks and subjective taste-based assessments, the presenter encourages developers to use and interpret evals strategically for their own agentic flows. The key takeaway is to understand the limitations of current evaluation methods and employ them with critical awareness.

View original episode ↗