acm-header
Sign In

Communications of the ACM

ACM TechNews

Chatbots Trained to 'Jailbreak' Rivals


View as: Print Mobile App Share:
Breaking through a chatbot's defenses.

Masterkey is a two-fold method in which an attacker would reverse-engineer an LLM's defense mechanisms, then use this acquired data to teach another LLM to learn how to create a bypass.

Credit: AI-generated image from DALL-E

Researchers at Singapore's Nanyang Technological University "jailbroke" popular large language model (LLM) chatbots including ChatGPT, Google Bard, and Bing Chat so that the targeted chatbots would generate valid responses to malicious queries.

The Masterkey method begins with reverse-engineering an LLM's defense mechanisms. Then, with the acquired data, another LLM is taught to learn how to create a bypass.

Masterkey was found to be three times more effective in jailbreaking LLM chatbots than standard prompts normally generated by LLMs, and due to its ability to learn from failure and evolve, it also rendered any patches useless.

From Tom's Hardware
View Full Article

 

Abstracts Copyright © 2024 SmithBucklin, Washington, D.C., USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account