We built a local-first AI agent infrastructure on two DGX Sparks. Today we're applying to the NVIDIA Inception Program. Here's what we built, why, and what we're asking for.
The problem we kept running into
Cloud AI costs are fine when you're running a single assistant. They become a problem when you're running six always-on agents — one for each household member and collaborator in the fleet — each generating continuous inference traffic. At that scale, the math doesn't work. Neither does the privacy story. Personal AI agents process intimate data: health metrics, family schedules, financial patterns. That data shouldn't live in third-party cloud infrastructure.
There's also a latency problem. Always-on agents need to respond in under 200ms. Cloud round-trips add 400–1200ms depending on load. And there's lock-in: proprietary APIs constrain what you can build, and rate limits punish the workloads that matter most.
We needed to own the stack.
What we built
MetaClaw is an intelligent routing layer that sits between local inference (Nemotron-3-Super-120B on DGX Spark 1) and cloud Claude. A complexity scorer (0.0–1.0) evaluates every incoming query and decides where it goes. Simple queries stay local. Complex ones escalate. We're at 40–50% local routing now; the long-term target is 88.7% — the benchmark from Stanford's OpenJarvis study for "what can stay on-device."
On top of the router:
- Multi-agent delegation framework — 6 agents across 3 households and 2 collaborators, with shared task routing and context passing
- Research DAG — cross-domain knowledge propagation between agents. An insight in the Tesla API subagent propagates as a hypothesis to the fleet manager. Closed, trusted, no prompt injection risk from external agents.
- Self-improving skill extraction pipeline — post-session pattern extraction → human review → weekly deployment. Human-gated throughout.
- OpenViking memory layer — L0/L1/L2 hierarchical semantic knowledge graph, 713 indexed items, 1.27M VLM tokens
The numbers so far
- 67% token cost reduction from dynamic context assembly vs. static loading
- 6 agents deployed and running
- 108,000+ emails processed by autonomous pipeline
- Integrations live: Tesla Fleet API, Eight Sleep, Whoop biometrics
- 14/14 routing test cases correct
The hardware
Two DGX Spark units, 128GB each, 256GB pooled via NVLink-C2C. Spark 1 runs Nemotron-3-Super-120B-A12B in NVFP4 (~80GB) via TRT-LLM for live inference. Spark 2 handles async response scoring and, eventually, RL fine-tuning. The RL pipeline is gated — it only runs if 500+ samples show clear improvement and we approve it explicitly. Model drift is a real risk and we're not rushing it.
(The story of getting them powered on is its own post.)
What we're asking NVIDIA for
- Early NIM container access — Nemotron, Parakeet ASR, Fish Audio TTS
- TRT-LLM technical support for Blackwell/NVFP4 deployment
- DGX Cloud credits for RL training workloads
- GTC featured demo consideration — we're already aligned with the OpenClaw Playbook use case on the GTC Park floor
The vision
MetaClaw puts sovereign AI in every home. Not a product yet — a working system, deployed across a real family fleet, with real integrations, real memory, and real routing intelligence. The DGX Sparks are the backbone. Nemotron is the brain. And GTC is this week.
Good timing.