AI Engineer May 14, 2026

Ship Real Agents: Hands-On Evals for Agentic Applications — Laurie Voss, Arize

Summary

Lori Voss, head of developer experience at AriseAI and former npm Inc. co-founder, delivers a comprehensive workshop on testing and evaluating AI systems and agents. The presentation covers fundamental techniques for evaluating AI performance, including tracing, data analysis, and different types of evaluations such as code, deterministic, and LLM-based assessments. The key practical takeaway is that evaluations should not just identify problems but serve as a continuous improvement process for AI agents, with a focus on iterative development and understanding performance through systematic testing and meta-evaluation.

View original episode ↗

Mobile experience coming soon

Ship Real Agents: Hands-On Evals for Agentic Applications — Laurie Voss, Arize

Summary