Running LLMs locally: Practical LLM Performance on DGX Spark — Mozhgan Kabiri chimeh, NVIDIA
Summary
Nvidia's developer relations team explores running large language models (LLMs) locally on the Jetson Spark, a compact system designed for AI development that can handle models up to 200 billion parameters. The presentation focuses on practical infrastructure challenges in AI, such as memory limitations, software compatibility, and the need for reproducible, efficient development workflows. By demonstrating an automated benchmarking approach with quantized models and a standardized testing protocol, the talk highlights the potential of bringing powerful AI development closer to individual developers without completely replacing cloud infrastructure. The key takeaway is that local, optimized systems can significantly improve developer productivity and provide a seamless transition between desktop, data center, and cloud environments.