The Future of Evals - Ankur Goyal, Braintrust
Summary
The speaker discusses the evolution of evals (evaluations) in AI, highlighting Brain Trust's work with top AI companies and their new product called Loop, an agent that can automatically optimize prompts, data sets, and scorers. The key breakthrough has been Claude 4, which performs six times better than previous models, enabling more sophisticated AI evaluation and improvement processes. The practical takeaway is that Loop is now available for Brain Trust users to automatically enhance AI performance across different models, offering a more dynamic and intelligent approach to AI development and testing.