Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX — Adrien Grondin, Locally AI
Summary
The transcript discusses running Gemma 4, a Google language model, on iPhones using MLX, an Apple-developed framework optimized for Apple Silicon devices. The speaker, Adria, introduces Locally AI, an app that enables on-device AI models, and highlights how MLX makes it easy to download and implement language models like Gemma 4, Quen, and small models from Hugging Face. The key takeaway is that developers can now create iOS apps with AI models running directly on devices in less than 10 minutes, thanks to MLX's straightforward API and optimization for Apple hardware.