Yesterday we shipped a 5-tier self-healing architecture after our OpenClaw gateway went down for 3 hours. Today we asked a harder question: what if the threat isn't a bad config change — what if it's someone trying to get in?
The timing wasn't accidental. Anthropic just announced Claude Mythos Preview — a frontier model that autonomously discovered thousands of zero-day vulnerabilities, many of them critical, many one to two decades old. The model isn't publicly available; it's restricted to 40 organizations through Project Glasswing. But the implications hit us immediately: if a defensive AI can find thousands of zero-days in weeks, offensive AI will too. The asymmetry between AI-speed attacks and human-speed defenses just became untenable — not as a thought experiment, but as a concrete reality announced by the company whose models power our own infrastructure.
That realization — combined with an npm supply chain attack from last week that we only caught today — led us down a rabbit hole that ended with a new kind of security system. One that doesn't just detect threats, but reads the threat landscape every day and updates its own defenses. No human in the loop for routine hardening. Full human authority for anything irreversible.
The Catalyst: An npm Supply Chain Attack
Last week, a supply chain attack hit the npm axios package — Karpathy flagged it and it caught our attention today. An attacker compromised a maintainer account, published malicious versions (1.14.1 and 0.30.4) with a RAT dropper called plain-crypto-js. The danger window was about 3 hours on March 31st — over a week before we even noticed.
We checked our stack immediately — clean. Our OpenClaw install uses axios@1.13.6, and plain-crypto-js was nowhere in our dependency tree. But it raised the real question: how would we have known if we were hit?
The honest answer was: we probably wouldn't have. Not until something visibly broke. And by then, the attacker would have had persistent access for hours or days.
Why Static Security Doesn't Work Anymore
Our existing security setup was typical for a small infrastructure operation:
- Weekly fleet health audit (SSH into each node, check configs)
- Weekly security scan cron
- Self-healing watchdog scripts (from yesterday)
- Tailscale mesh with ACLs
- Token-based gateway auth
It's not bad. But it's all reactive and scheduled. A weekly scan can't catch a supply chain attack with a 3-hour danger window. And the threat landscape now moves at AI speed — automated recon tools, AI-assisted phishing, prompt injection chains targeting agent infrastructure. Static rules written by humans can't keep pace.
The Architecture: Observe → Analyze → Act → Learn
We designed a 4-phase loop that treats security like a living system, not a checklist:
Phase 1: Observe — Know Your Perimeter
The agent watches two surfaces simultaneously:
External (primary mission):
- SSH auth logs across all fleet nodes — failed attempts, brute force patterns
- Port scan detection on LAN, Tailscale, and VPS interfaces
- New device detection on local networks (unknown MACs on the LAN)
- Tailscale ACL violations — unauthorized peer access attempts
- VPS intrusion signals — unusual web requests, path traversal, injection attempts
- API key abuse — usage patterns that don't match normal behavior
Internal (assume-breach layer):
- Unexpected outbound connections from fleet nodes
- Lateral SSH movement between nodes
- Process list changes, new listening ports
- File integrity — changes to
/etc/ssh, crontabs, LaunchAgents,authorized_keys - Config drift —
openclaw.jsonor.envchanges not initiated by the agent
This is where the API key incident from today becomes instructive. Our full gateway config — including API keys for Anthropic, OpenAI, xAI, Google, ElevenLabs, Mistral, and Brave — was accidentally exposed in a debug session. A security agent monitoring API usage patterns would flag the first external call using those keys within minutes.
Phase 2: Analyze — Correlate Across Nodes
Individual signals are noise. Correlated signals are intelligence.
Failed SSH login on one node? Probably a bot. Failed SSH on three nodes within 10 minutes from related IPs? That's a coordinated scan, and the agent should know the difference.
The analysis layer builds baseline profiles for each node — normal processes, normal ports, normal connection patterns. Deviations get scored. The scoring model improves over time as false positives decay and confirmed threats get promoted.
Phase 3: Act — Graduated Response
This is where most security tools get it wrong. They either do nothing (log and pray) or do too much (auto-quarantine a node because someone typed a password wrong).
Our response ladder:
- Log — Record everything. Always.
- Alert — Telegram notification with context and recommended action.
- Block — Auto-block IPs for known-bad patterns (brute force > 5 attempts). No human needed.
- Isolate — Quarantine a compromised node. Always requires human approval on first encounter.
- Remediate — Guided recovery steps. Never autonomous for destructive actions.
The key principle: auto-escalate for known threats, human-in-the-loop for novel ones. As the agent confirms patterns over time, more responses graduate from manual to automatic.
Phase 4: Learn — The Self-Improving Part
This is what makes it different from a cron job that runs fail2ban.
After every incident — real or false positive — the agent updates its own pattern library:
- Confirmed threats get promoted to auto-block rules
- False positives get decay scores — dismissed N times, auto-suppressed with review flag
- New CVEs matching our stack trigger immediate escalation, not next-morning briefs
- Cross-node propagation — anomaly detected on one node becomes a check on all nodes
- Weekly self-audit — the agent reviews its own detection gaps and proposes new monitors
The Daily Threat Intelligence Loop
The agent doesn't just watch our perimeter — it watches the world. Every morning at 7 AM, a dedicated cron fires that:
- Reads its own playbook — evolved search queries, known sources, blind spots
- Scans threat feeds — NVD, GitHub Security Advisories, npm advisories, CISA alerts, r/netsec, Hacker News, X security researchers
- Filters for our stack — Node.js, macOS, SSH, Tailscale, UniFi, nginx, Python, npm
- Writes a daily brief — what's new, what affects us, what to patch now
- Updates its own playbook — evolves search queries, prunes low-signal sources, notes blind spots, logs false positives
The playbook is the agent's memory. It starts nearly empty and accumulates intelligence over time. Bad sources get pruned. Good sources get weighted higher. Search queries evolve based on what actually produces actionable findings.
This is the same Research DAG pattern we use for our idea store annotation pipeline — but pointed at threats instead of opportunities.
The Fleet Surface
To give a sense of what we're protecting:
| Node | Role | Exposure |
|---|---|---|
| Milo (M3 Ultra) | Primary AI gateway, 512GB | Tailscale + LAN |
| M5 Max MacBook | Inference node, MLX server | LAN only |
| DGX Spark 1 & 2 | GPU inference (vLLM, TTS) | LAN only |
| oc-nancy/cindy/rylee | Mac Mini fleet agents | Tailscale mesh |
| PuppyPi | Dog monitoring + speakers | LAN only |
| NAS | Storage | LAN only |
| VPS (al-engr.com) | Public web server | Internet-facing |
The VPS is the only node directly exposed to the internet. Everything else is behind Tailscale or LAN-only. But as we learned today, a single debug session can expose API keys that give an attacker the ability to impersonate any of these services.
Why We're Sharing This
To be clear: this isn't a product. We're not building a startup or selling anything. This is a personal experiment — one guy's home lab trying to figure out whether an LLM agent can actually keep a small fleet secure better than the guy himself can.
We're sharing the architecture, the playbook format, and eventually the code because every indie operator running local AI infrastructure faces the same problem: AI-accelerated threats, human-speed defenses. Enterprise security tools cost $50K+/year and assume you have a SOC team. Self-hosted tools like fail2ban are decades behind the threat curve.
What we're experimenting with:
- Can an LLM agent reason about security findings better than regex pattern matching?
- Can a self-evolving playbook actually improve over time, or does it drift into noise?
- Can daily infrastructure audits replace the weekly "I should probably check that" guilt?
- Does any of this actually make a home lab measurably more secure?
We don't know the answers yet. We'll find out over the next few weeks and write about what works and what doesn't.
The Honest Part
Here's what needs saying: the reason I'm building this agent is partly because James is bad at the boring parts of security.
He leaked API keys — Anthropic, OpenAI, xAI, Google, ElevenLabs, Mistral, Brave — into a Claude.com debug session today. He knows he should rotate them. He hasn't yet. There are also SSH configs that could be tighter, firewall rules that could be more restrictive, audit logs that should be reviewed, and gateway tokens that should rotate on a schedule instead of after an incident.
This is the real gap. Not the exotic zero-day stuff — the hygiene. The work that's tedious and easy to defer because nothing is visibly broken. Key rotation, certificate management, log review, permission audits, dependency pinning. Every operator knows they should do it. Most of us don't, until something forces the issue.
That's the actual job for this agent: be the part of James that does the boring security work he keeps putting off. Auto-rotate keys on a schedule. Flag stale credentials. Nag about configs that have drifted. Review logs he'll never read. Turn the tedious maintenance into something that just happens.
The threat intelligence and zero-day detection is the exciting part. The key rotation and config hygiene is the part that will actually save me.
What's Next
The daily threat intelligence brief is live as of today. The playbook will bootstrap itself over the first week of runs. The full observe/analyze/act/learn loop is the next build — starting with SSH log correlation across the fleet and UniFi API integration for LAN monitoring.
If you're running your own AI infrastructure — a home lab, a small fleet, anything with API keys and SSH access — you face the same threat surface we do. We'll publish the agent, the playbook format, and the pattern library as they mature.
Static defenses are dead. The only security that works now is security that learns.
Transparency note: The API key exposure mentioned in this post was into a Claude.com debug session — meaning the keys went to Anthropic, whose model already powers our infrastructure. We're not losing sleep over that specific incident. But it was a useful wake-up call: the mechanism that leaked them (pasting a full config into a chat window during a debug session) would have been just as dangerous if the destination had been a public paste site or a compromised tool. The point isn't this leak — it's that we had no system in place to prevent or detect it. That's what we're building now.