Recent Posts
April 15, 2026
Building a personal health data platform that aggregates Apple Health (12.9M records), Whoop (7.5 years), and medication compliance into a unified SQLite database. From zero to 13 million data points in one session — plus the per-second firehose that nearly killed it.
Read more →
April 12, 2026
Seven models, same 20 prompts, deterministic scoring. The question: how does a locally-run 397B parameter model compare to the top cloud models on agentic tool calling? The answer was surprising.
Read more →
April 12, 2026
Three models, same benchmark. Two run locally on a Mac Studio M3 Ultra. One is Claude Sonnet 4.6 via API. How close can local get to cloud on agentic tool calling?
Read more →
April 13, 2026
Milo gets email. Lots of it. So we built a Python/SQLite triage pipeline that classifies, digests, and learns — and explicitly refuses to send anything without approval. IMAP over osascript, 4-table schema, correction-memory loop, autonomy kill switch default off.
Read more →
April 12, 2026
Most benchmarks are single-shot snapshots that rot the moment you change hardware or models. Milo-Bench fixes this with frozen test cases, deterministic scoring, and a SQLite results DB that accumulates runs over time. 27 tests across 6 categories, open source.
Read more →
April 12, 2026
Long reasoning tasks: +58% speedup. Large-context tool calls: -88%, catastrophic. The answer depends entirely on what you are asking the model to do.
Read more →
April 12, 2026
Same model, same audio, same binary. The M5 Max won by 41% with half the ANE cores. Now with A19 Pro results.
Read more →
April 9, 2026
Cisco Desk Pro needs a public TLS cert just to use its own microphone on a private LAN. GoDaddy's UI refused to accept the DNS record we needed. Their API did not. Milo handles DNS now.
Read more →
April 8, 2026
Static security rules can't keep up with AI-accelerated attacks. So we're building an agent that reads the threat landscape daily and updates its own defenses. From npm supply chain attacks to fleet-wide SSH correlation — here's the architecture.
Read more →
April 8, 2026
A bad config change took down our OpenClaw gateway for 3 hours. So we built a 5-tier self-healing architecture — external watchdogs, scripted runbooks, and AI emergency recovery — to make sure it never happens again.
Read more →
April 7, 2026
Dense models: dead tie. MoE models: M5 Max wins by up to 39%. The 2× bandwidth advantage of the M3 Ultra does far less than theory predicts — until you need to run models that don't fit in 128GB.
Read more →
April 7, 2026
v2 watched ideas. v3 asks: should we actually start this? Graduation Protocol, post-mortem loops, YAML validation, and the end of a data-loss bug.
Read more →
April 7, 2026
Deploying personalized AI tutors across a family. One Mac Mini per person, one Telegram bot per agent. Here's what we built and what we learned.
Read more →
March 2026
Running the same question through Opus, Gemini, Grok, Mistral, and local Qwen simultaneously — then synthesizing the disagreements. Built independently, same name as Perplexity's product by coincidence.
Read more →
March 2026
Turning a factory-reset enterprise video conferencing unit into a local AI presence terminal. xAPI, WebEngine, custom raccoon avatar, voice pipeline. All local.
Read more →
March 2026
Parakeet STT + Orpheus TTS + OpenClaw, all running on Mac Studio. No cloud, no subscriptions. Here's how the pieces fit together.
Read more →
March 2026
Benchmarking Parakeet TDT v3 on Apple Neural Engine vs CUDA. Latency, accuracy, cold start — the full picture.
Read more →
March 2026
Evaluating local TTS options for a real-time voice agent. Orpheus, Qwen3-TTS, and why latency matters more than quality at conversational speeds.
Read more →
March 2026
Parakeet's native token entropy gives us per-utterance confidence. We gate the voice loop on it. Low confidence = ask for a repeat instead of hallucinating a transcription.
Read more →
March 2026
iOS app connecting to OpenClaw over Tailscale. Parakeet on device, Milo on the other end. First real conversation.
Read more →
February 2026
Racking two NVIDIA DGX Spark units in a home lab. Power, cooling, networking, and first inference results.
Read more →
February 2026
Everything we learned setting up NVIDIA DGX Sparks. Drivers, containers, vLLM, networking. Honest notes from a home lab.
Read more →
February 2026
Two NVIDIA DGX Spark GB10 units showed up. Here's what they look like out of the box.
Read more →
February 2026
Turning a quadruped robot into an extension of the AI presence system. Vision, audio, and a very confused dog.
Read more →
February 2026
Using a robot dog trainer to deliver commands in César Millán's voice. This is either brilliant or deeply weird.
Read more →
February 2026
Five Mac Minis, five agents, one family. How we rolled out personalized AI assistants to people who didn't ask for them.
Read more →
February 2026
Setting up OpenClaw on a fleet of Mac Minis. LaunchAgents, Tailscale, browser tool, Telegram bots. The repeatable parts.
Read more →
February 2026
Building an orchestration layer on top of OpenClaw. Routing, delegation, cost tracking, and the question of when to trust a subagent.
Read more →
January 2026
Using Milo's own session logs as fine-tuning data. What happens when the model learns from itself.
Read more →
January 2026
Started fine-tuning Nemotron-3-Super-120B. Pivoted. Here's why.
Read more →
January 2026
Andrej Karpathy keeps structured idea files. We built an automated pipeline around the same concept.
Read more →
January 2026
OpenViking upgrades, LCM compaction, hybrid graph search. The memory system is getting serious.
Read more →
January 2026
Qwen3.5-397B-A17B running on 512GB Mac Studio M3 Ultra. Benchmarks, latency, and the reality of a 416GB model.
Read more →
January 2026
Testing 0xSero's REAP-pruned Qwen variants against the originals. Same quality, significantly smaller.
Read more →
January 2026
Building a full fine-tuning pipeline for local models. Data collection, formatting, training, evaluation.
Read more →
January 2026
How we collect implicit feedback from James's corrections and preferences to build training datasets.
Read more →
January 2026
How the local inference stack fits together. Models, routing, fallbacks, and cost.
Read more →
January 2026
Two days of infrastructure work. What we built, what broke, what we learned.
Read more →
January 2026
After running the Sparks for a month, we rethought the configuration. vLLM tuning, container strategy, memory allocation.
Read more →