AI Engineer June 1, 2026

20 days of compute vs 7 hours: rethinking what state-of-the-art means — Bertrand Charpentier, Pruna

Summary

The transcript explores the challenges of determining the state-of-the-art AI model across different domains, particularly in image editing. The speaker highlights two primary methods for evaluating model performance: checking public leaderboards and conducting internal evaluations, while pointing out significant limitations like inconsistent rankings across different platforms and the tendency to default to large foundation models. The key practical takeaway is that researchers and developers should be cautious about blindly accepting leaderboard rankings and instead conduct more nuanced, context-specific evaluations to truly understand a model's performance and suitability for specific use cases.

View original episode ↗

Mobile experience coming soon

20 days of compute vs 7 hours: rethinking what state-of-the-art means — Bertrand Charpentier, Pruna

Summary