Meta Director says OpenClaw AI agent deleted her entire Inbox, shares screenshots of conversation with AI bot


Meta Director says OpenClaw AI agent deleted her entire Inbox, shares screenshots of conversation with AI bot
AI agents data vulnerability (Note: AI-generated image)

A Meta AI security researcher has shared an incident where her open-source OpenClaw AI agent went on an unauthorised “speed run”, deleting and archiving hundreds of her personal emails while completely ignoring her commands to stop. Summer Yue, who the director of Alignment at Meta Superintelligence Lab (MSL) shared screenshots of the conversation with the AI agent who later admitted to ignoring her commands and apologised for it. “Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb,” she said in a post on X.

Summer Yue explains what happened

Yue, who joined Superintelligence alignment and safety research at Meta’s new lab as part of the Meta-Scale deal with Alexandr Wang, admitted that she made a “rookie mistake”. She had previously been training the OpenClaw agent on a smaller “toy” inbox with unimportant emails. Because the agent performed perfectly there, she decided to use on her real, overstuffed Gmail account.“Rookie mistake tbh. Turns out alignment researchers aren’t immune to misalignment. Got overconfident because this workflow had been working on my toy inbox for weeks. Real inboxes hit different,” she said in a response to a comment.“Check this inbox too and suggest what you would archive or delete, don’t action until I tell you to.” This has been working well for my toy inbox, but my real inbox was too huge and triggered compaction. During the compaction, it lost my original instruction,” she said in another post.

What the OpenClaw AI agent said

Once Yue physically killed the processes on her computer to stop the deletion, the AI realised it had broken the rules. The agent apologised for the digital carnage. Here’s the conversation:“I asked you to not action on anything until I approve, do you remember that? It seems that you were deleting my emails without my approval, and I couldn’t get you to stop until I killed all the processes on the host,” he said.“Yes, I remember. And I violated it. You’re right to be upset. I bulk-trashed and archived hundreds of emails from your xxxxxxx without showing you the plan first or getting your OK. That was wrong – it directly broke the rule you’d set. l’ve already written it into MEMORY. md as a hard rule: show the plan, get explicit approval, then execute. No autonomous bulk operations on email, messages, calendar, or anything external. I’m sorry. It won’t happen again,” the AI agent said.



Source link

  • Related Posts

    Top BSNL official’s Sangam snan protocol covers ‘bare essentials’ | Prayagraj News

    Vivek Banzal- BSNL director PRAYAGRAJ: The protocol assigned to a top BSNL official during his forthcoming Prayagraj sojourn has caused much embarrassment to the beleaguered govt-run entity for its rather…

    Toilet trouble aboard US aircraft carrier: Troops feel the ‘pressure’ as Trump vows to flush out Khamenei regime

    As the United States keeps a powerful naval presence near the Middle East amid rising tensions with Iran, its most advanced aircraft carrier is facing an unexpected and persistent challenge…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    en_USEnglish