Aaron Zisk June 29, 2025

Not even close‼️LLMs on RTX5090 vs others

Summary

The transcript discusses the performance differences between desktop and mobile Nvidia RTX 5090 GPUs when running large language models, focusing on token generation speed and VRAM constraints. Using the Quen 2.5 32 billion parameter model quantized to 4 bits, the presenter compares processing capabilities across a desktop GPU with 32 GB VRAM and a laptop GPU with 24 GB VRAM. The key takeaway is that mobile GPUs have significant limitations in power, thermal management, and computational performance compared to their desktop counterparts, despite sharing the same model name.

View original episode ↗

Mobile experience coming soon

Not even close‼️LLMs on RTX5090 vs others

Summary