Aaron Zisk April 8, 2026

After This, 16GB Feels Different

Summary

The transcript discusses the technical nuances of model compression and quantization in machine learning, focusing on how reducing file sizes and memory requirements enables running larger language models on smaller hardware. Key references include discussions of Quen 3.5, various quantization techniques (8-bit, 4-bit), and the trade-offs between model size and performance across different compression levels. The practical takeaway is that while compression can significantly reduce memory requirements, researchers must carefully balance model size with maintaining acceptable performance, with four-bit quantization representing a practical lower limit for most applications.

View original episode ↗

Mobile experience coming soon

After This, 16GB Feels Different

Summary