$1 AI Guardrails: The Unreasonable Effectiveness of Finetuned ModernBERTs – Diego Carpentero
Summary
The transcript discusses the evolving landscape of AI system vulnerabilities, particularly focusing on prompt injection attacks targeting large language models (LLMs). Key references include the Sydney case involving Bing Chat, where students successfully manipulated the AI to reveal confidential system prompts and rules through simple natural language queries. The practical takeaway emphasizes the critical need for robust defensive layers and understanding attack vectors, with the speaker proposing to build a low-latency, cost-effective protection mechanism by fine-tuning state-of-the-art encoder models.