Researchers at Singapore's Nanyang Technological University "jailbroke" popular large language model (LLM) chatbots including ChatGPT, Google Bard, and Bing Chat so that the targeted chatbots would generate valid responses to malicious queries.
The Masterkey method begins with reverse-engineering an LLM's defense mechanisms. Then, with the acquired data, another LLM is taught to learn how to create a bypass.
Masterkey was found to be three times more effective in jailbreaking LLM chatbots than standard prompts normally generated by LLMs, and due to its ability to learn from failure and evolve, it also rendered any patches useless.
From Tom's Hardware
View Full Article
Abstracts Copyright © 2024 SmithBucklin, Washington, D.C., USA
No entries found