AI Engineer April 20, 2026

Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX — Adrien Grondin, Locally AI

Summary

The transcript discusses running Gemma 4, a Google language model, on iPhones using MLX, an Apple-developed framework optimized for Apple Silicon devices. The speaker, Adria, introduces Locally AI, an app that enables on-device AI models, and highlights how MLX makes it easy to download and implement language models like Gemma 4, Quen, and small models from Hugging Face. The key takeaway is that developers can now create iOS apps with AI models running directly on devices in less than 10 minutes, thanks to MLX's straightforward API and optimization for Apple hardware.

View original episode ↗

Mobile experience coming soon

Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX — Adrien Grondin, Locally AI

Summary