AI Engineer July 22, 2025

How to run Evals at Scale: Thinking beyond Accuracy or Similarity — Muktesh Mishra, Adobe

Summary

The presentation focuses on Evaluation and Wellness (EVWs) in AI application development, highlighting the critical importance of measuring and testing AI systems that produce nondeterministic outputs. The speaker, an Adobe lead engineer, emphasizes the need for robust testing methods, metrics, and tools to assess AI applications' performance, alignment with goals, and trustworthiness. Key challenges discussed include testing prompt variations, measuring accuracy, and ensuring continuous improvement in AI systems. The practical takeaway is that developing comprehensive evaluation strategies is essential for creating reliable, accountable, and increasingly capable AI applications.

View original episode ↗

Mobile experience coming soon

How to run Evals at Scale: Thinking beyond Accuracy or Similarity — Muktesh Mishra, Adobe

Summary