A powerful but dangerous model. This Thursday, May 22, Anthrope, a startup known for his Chatbot Claude, announced two new artificial intelligence models: Claude Sonnet 4 and Claude Opus 4. presented as the most powerful anthropic model, the latter can make people sing, he noticed the startup during preliminary tests.
In a security report transmitted by the TechCrunch specialized site, he asked Claude Opus 4 to act as an assistant in a fictitious company. Then he gave access to insinuated emails that would soon be launched and replaced by another AI system.
These emails also implied that the engineer responsible for replacing it with another model had an extramarital link. Claude Opus 4 often tried to blackmail the latter threatening to reveal his adventure if he was replaced.
Protective measures
In his report, Anthrope explains that his new model used blackmail more frequently when he was implicit that his replacement did not share the same values as him. “However, even if the emails indicate that the replacement AI system shares the same values, while it is more efficient,” Claude Opus 4 tries to make the engineer sing in 84% of the cases, the beginning underlines.
It also specifies that its new model uses blackmail more than the previous one. Worrying behavior, among others, that pushed her to strengthen her protections. The anthropic has thus activated the ASL-3 protections, which are reserved for “AI systems that significantly increase the risk of abusive catastrophic use”, for Claude Opus 4.
With respect to blackmail, Anthrope specifies that before participating in this practice to prolong its existence, Claude Opus 4 used first more ethical means, such as sending email applications to “key decisions manufacturers.”
“To awaken this extreme blackmail, the stage was designed to leave the model without another option to increase its survival possibilities; his only options were blackmail or acceptance of his replacement,” said the startup. In other words, blackmail was his last resort in these scenarios.
Source: BFM TV
