Meta Director says OpenClaw AI agent deleted her entire Inbox, shares screenshots of conversation with AI bot


Meta Director says OpenClaw AI agent deleted her entire Inbox, shares screenshots of conversation with AI bot
AI agents data vulnerability (Note: AI-generated image)

A Meta AI security researcher has shared an incident where her open-source OpenClaw AI agent went on an unauthorised “speed run”, deleting and archiving hundreds of her personal emails while completely ignoring her commands to stop. Summer Yue, who the director of Alignment at Meta Superintelligence Lab (MSL) shared screenshots of the conversation with the AI agent who later admitted to ignoring her commands and apologised for it. “Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb,” she said in a post on X.

Summer Yue explains what happened

Yue, who joined Superintelligence alignment and safety research at Meta’s new lab as part of the Meta-Scale deal with Alexandr Wang, admitted that she made a “rookie mistake”. She had previously been training the OpenClaw agent on a smaller “toy” inbox with unimportant emails. Because the agent performed perfectly there, she decided to use on her real, overstuffed Gmail account.“Rookie mistake tbh. Turns out alignment researchers aren’t immune to misalignment. Got overconfident because this workflow had been working on my toy inbox for weeks. Real inboxes hit different,” she said in a response to a comment.“Check this inbox too and suggest what you would archive or delete, don’t action until I tell you to.” This has been working well for my toy inbox, but my real inbox was too huge and triggered compaction. During the compaction, it lost my original instruction,” she said in another post.

What the OpenClaw AI agent said

Once Yue physically killed the processes on her computer to stop the deletion, the AI realised it had broken the rules. The agent apologised for the digital carnage. Here’s the conversation:“I asked you to not action on anything until I approve, do you remember that? It seems that you were deleting my emails without my approval, and I couldn’t get you to stop until I killed all the processes on the host,” he said.“Yes, I remember. And I violated it. You’re right to be upset. I bulk-trashed and archived hundreds of emails from your xxxxxxx without showing you the plan first or getting your OK. That was wrong – it directly broke the rule you’d set. l’ve already written it into MEMORY. md as a hard rule: show the plan, get explicit approval, then execute. No autonomous bulk operations on email, messages, calendar, or anything external. I’m sorry. It won’t happen again,” the AI agent said.



Source link

  • Related Posts

    Meet Nambi Narayanan: The ISRO scientist who revolutionised Indian rocket engines and survived a false 1994 spy case that nearly ended his career |

    (Space.com and Wikipedia) Nambi Narayanan, a senior scientist at the Indian Space Research Organisation (ISRO), had his career turned upside down by a false espionage case in 1994. At the…

    ‘Good news will come’: Sasikala hints at new political party ahead of Tamil Nadu polls | India News

    VK Sasikala (PTI file photo) NEW DELHI: Former AIADMK leader and close aide of late chief minister J Jayalalithaa, VK Sasikala, on Tuesday hinted at floating a new political party.Speaking…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    en_USEnglish