ACM

Communications of the ACM

Home/News/Chatbots Trained to 'Jailbreak' Rivals/Full Text

ACM TechNews

Chatbots Trained to 'Jailbreak' Rivals

By Tom's Hardware
January 2, 2024
Comments

View as: Print Mobile App Share:

Breaking through a chatbot's defenses. — Masterkey is a two-fold method in which an attacker would reverse-engineer an LLM's defense mechanisms, then use this acquired data to teach another LLM to learn how to create a bypass.

Credit: AI-generated image from DALL-E

Researchers at Singapore's Nanyang Technological University "jailbroke" popular large language model (LLM) chatbots including ChatGPT, Google Bard, and Bing Chat so that the targeted chatbots would generate valid responses to malicious queries.

The Masterkey method begins with reverse-engineering an LLM's defense mechanisms. Then, with the acquired data, another LLM is taught to learn how to create a bypass.

Masterkey was found to be three times more effective in jailbreaking LLM chatbots than standard prompts normally generated by LLMs, and due to its ability to learn from failure and evolve, it also rendered any patches useless.

From Tom's Hardware
View Full Article

No entries found