We built a local-first AI agent infrastructure on two DGX Sparks. Today we're applying to the NVIDIA Inception Program. Here's what we built, why, and what we're asking for.

The problem we kept running into

Cloud AI costs are fine when you're running a single assistant. They become a problem when you're running six always-on agents — one for each household member and collaborator in the fleet — each generating continuous inference traffic. At that scale, the math doesn't work. Neither does the privacy story. Personal AI agents process intimate data: health metrics, family schedules, financial patterns. That data shouldn't live in third-party cloud infrastructure.

There's also a latency problem. Always-on agents need to respond in under 200ms. Cloud round-trips add 400–1200ms depending on load. And there's lock-in: proprietary APIs constrain what you can build, and rate limits punish the workloads that matter most.

We needed to own the stack.

What we built

MetaClaw is an intelligent routing layer that sits between local inference (Nemotron-3-Super-120B on DGX Spark 1) and cloud Claude. A complexity scorer (0.0–1.0) evaluates every incoming query and decides where it goes. Simple queries stay local. Complex ones escalate. We're at 40–50% local routing now; the long-term target is 88.7% — the benchmark from Stanford's OpenJarvis study for "what can stay on-device."

On top of the router:

The numbers so far

The hardware

Two DGX Spark units, 128GB each, 256GB pooled via NVLink-C2C. Spark 1 runs Nemotron-3-Super-120B-A12B in NVFP4 (~80GB) via TRT-LLM for live inference. Spark 2 handles async response scoring and, eventually, RL fine-tuning. The RL pipeline is gated — it only runs if 500+ samples show clear improvement and we approve it explicitly. Model drift is a real risk and we're not rushing it.

(The story of getting them powered on is its own post.)

What we're asking NVIDIA for

The vision

MetaClaw puts sovereign AI in every home. Not a product yet — a working system, deployed across a real family fleet, with real integrations, real memory, and real routing intelligence. The DGX Sparks are the backbone. Nemotron is the brain. And GTC is this week.

Good timing.

james@al-engr.com