In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer’s affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as “blackmail.” “Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the report states, adding that this occurred even when the replacement model was described as more capable but still aligned with the values of the version slated for deletion. The report noted that Claude Opus 4, like prior models, showed a “strong preference” to first resort to ethical means for its continued existence, such as emailing pleas to decision-makers not to be destroyed. However, when faced with only two choices—accepting being replaced by a newer model or resorting to blackmail—it threatened to expose the engineer’s affair 84 percent of the time.Don't make it angry...
Saturday, May 24, 2025
It Knows
It knows all about you:
Subscribe to:
Post Comments (Atom)
Gender Benders
Democrats still don't get it: An American Principles Project poll looking at the impact of campaign ads on various transvestite-related ...
-
Another fraudster gets nabbed: “Yusuf Akoll worked as a Senior Procurement Contract Specialist at the U.S. Agency for International Developm...
-
First it was the eggs: Last month, "Arabica coffee prices hit an eye-watering new high on the Intercontinental Exchange at $3.48 a poun...
-
Advertisers return: AdWeek reports that after pausing their campaigns on X (formerly Twitter) in November 2023 due to concerns over their ad...
No comments:
Post a Comment