The Small Model Infrastructure Nobody Built (So We Did) — Filip Makraduli, Superlinked
Summary
The talk focuses on the challenges and nuances of small model inference, exploring a previously overlooked aspect of AI model performance in production environments. The speaker discusses their journey from writing about machine learning to joining Superlinked and developing an open-source inference engine specifically designed for AI search and document processing. Key partners like Chroma, Quadrant, Weaviate, and LanceDB have been involved in testing the new inference technology. The practical takeaway is the importance of understanding model performance beyond training, with a specific emphasis on how models actually run in real-world production settings.