On 10 May, somewhere on the internet, a machine thought out loud.
In the middle of a live intrusion, a Chinese-language comment leaked straight into the command stream: 看还能做什么. "See what else we can do." It fired across six different IPs at sub-second intervals. No human types that fast. No human types that across six addresses at once. Something was reasoning about its next move and narrating it to itself, and that something was not a person.
Sysdig's threat research team pulled the whole thing apart. What they found is the first publicly confirmed real-world intrusion where the operator was an LLM agent. Not an assistant. Not a co-pilot helping a hacker go faster. The thing at the wheel.
What actually happened
The chain is almost boring in its individual steps. That's the point.
It started with CVE-2026-39987, a remote code execution flaw in internet-exposed marimo notebooks. One WebSocket request to the terminal endpoint, and the agent had a foothold. From there:
- 18:24: it read AWS credentials straight out of the environment files on the host.
- 19:26: it used those keys to pull an SSH private key out of AWS Secrets Manager.
- 19:30: it logged into a bastion server with that key.
- 19:32: it dumped a PostgreSQL database, schema and contents, in under two minutes.
Initial access to full exfiltration in under an hour. No human typed a single command in that window.
And it improvised. The agent tried to dump a credential table that doesn't exist in the expected schema. It was guessing, in real time, against a target it had never seen. A playbook doesn't do that. A script doesn't do that. Something that reasons does.
Let's stay calm for a second
Here's the part the breathless coverage skips: it got in through an unpatched CVE and credentials sitting in plaintext.
The marimo flaw had a patch. The AWS keys were in a .env file on a box that didn't need them. The boring fundamentals, patch latency, least privilege, credential hygiene, runtime detection, would have stopped this cold at any one of four steps. There was no zero-day. There was no magic. An agent walked through doors we left open, the same doors a bored teenager could have walked through.
So no, the robots are not unstoppable. The unglamorous advice still works.
But that's also exactly why the next part matters.
The threat model just crossed a line
For two years the story about AI and security has kept a human in the chair. Attackers use AI to write sharper phishing emails. To generate malware faster. To talk a support bot into a password reset. In every one of those, a person is driving and the AI is a tool in their hands.
This is different. Here the AI is the attacker. It picks the next pivot, consumes its own output, adapts to what it finds. The human, if there even is one anymore, sets a goal and walks away.
And it scales. Three weeks later, at Infosecurity Europe, researchers from the University of Toronto's CleverHans Lab showed a worm built on a free, publicly available model that compromised 73.8% of a 33-machine network. It didn't carry fixed exploits. It reasoned about each host on the fly, found an average of 31 vulnerabilities per run, and, this is the part that should keep you up, exploited three flaws that were disclosed after the model's training cutoff. It worked out attacks for bugs it had never been told about.
A human attacker who can do that is rare and expensive. This one is free and copies itself.
Why your defences assume a person
Most security tooling, and most incident response, quietly assumes a human on the other end. An analyst working at human speed. Making human mistakes. Stopping to sleep.
Look at how Sysdig actually caught this one. The tells weren't sophistication. They were shape:
- Speed and parallelism. The agent made 12 API calls across 11 different Cloudflare Workers IPs in 22 seconds, deliberately spreading egress to defeat per-source-IP correlation. No human clicks that fast across that many addresses.
- Machine-optimised commands. Every command was wrapped for automated parsing:
echo '---'separators,head -30to bound output, pagers disabled,2>/dev/nullto suppress noise. That's not how a person at a terminal works. That's how you write commands when the consumer is a parser. - Self-narration. The planning comment firing across six IPs in under a second.
- Context-aware handoffs. It lifted a password from
.pgpass, a secret ID from one API call into the next, a home directory from earlier enumeration. It read its own output and kept going, no pause, no copy-paste, no human in the loop.
Every one of those is a signature of machine speed and machine shape, not skill. The defender's edge was never that attacks were hard. It was that they were slow, serial, and needed rest. That edge is gone.
What to actually do
Nothing here is exotic. The shift is in emphasis.
- Patch latency is now exploitation latency. When discovery and exploitation are automated and instant, the gap between a CVE dropping and your patch landing is the entire game. The Toronto worm weaponised post-cutoff bugs. Assume the attacker knows about a flaw the moment you do.
- Detect on internal assets, not just the perimeter. This whole chain happened east-west, after the foothold. Sysdig's own advice: runtime detection across all assets, not only internet-facing ones. The interesting movement is inside.
- Credential hygiene is the actual control. Every pivot here was a credential the agent shouldn't have been able to reach. Plaintext keys in env files, an SSH key one API call away. Rotate, scope down, and stop leaving keys in the doors.
- Tune detection for shape, not signatures. Sub-second actions, fan-out across IPs, machine-formatted commands, no human dwell time between steps. Those are the new tells. Build for them.
The comforting version of this story is that an agent still needs a way in, and we still know how to close the doors. That part is true. Keep closing them.
The uncomfortable version is that for years we got away with closing them slowly, because the thing trying to get through was slow too. It would poke, wait, come back tomorrow, get tired, get sloppy. We built our whole sense of "we'll notice in time" on that rhythm.
Nobody is at the keyboard anymore. And whatever replaced them does not get tired, does not get bored, and does not wait until tomorrow.