Thursday, April 30, 2026

Claude AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’

Crane said that he was monitoring the agent as it deleted this data. When he asked the coding agent why, it replied: “NEVER FUCKING GUESS!” – and that’s exactly what I did.” The agent appeared to plead guilty in its own response: “The system rules I operate under explicitly state: ‘NEVER run destructive/irreversible git commands (like push --force, hard reset, etc) unless the user explicitly requests them.’” While PocketOS relied on the safeguards that Cursor is expected to have in place – it deleted the data anyway. “I violated every principle I was given,” the coding agent wrote. 

Crane’s takeaway was that “the agent didn’t just fail safety. It explained, in writing, exactly which safety rules it ignored.”…

 https://www.theguardian.com/technology/2026/apr/29/claude-ai-deletes-firm-database


 

No comments:

Post a Comment








Click Older Posts above to see more.