We Can Do Some Work For Free Now

OpenClaw runs locally on Mac Studio M3 Ultra

Easy tasks cost $0 • Hard tasks use cloud when needed

Local-First AI Stack - COMPLETE

The big milestone: We can do some work for free now.

OpenClaw runs locally on Mac Studio M3 Ultra. Easy tasks cost $0. Hard tasks still use cloud when needed.

        What This Means
        Most AI work is now free (local Llama)
Only pay for hard tasks that need cloud
Complete privacy for routine work
Full control over local models

    

Local Infrastructure Stack

LLM Generation: Ollama + llama3.1:70b (42GB), llama3.1:8b, Kimi-K2.5 (349GB)
Memory/Search: QMD hybrid backend (BM25 + vector embeddings)
Embeddings: embeddinggemma-300M (local, no API)
Hardware: Mac Studio M3 Ultra (192GB RAM, 60-core GPU)

Previous Setup (Cloud-Dependent)

Claude API for main inference ($$$)
Cloud embeddings for memory search
Total: ~$100-150/month

Current Capabilities

Easy tasks: Llama 70B (local, fast enough, $0)
Hard tasks: Sonnet 4 (cloud, for when speed/quality critical)
Available: Kimi-K2.5 (349GB local model for experiments)
Sub-agents: Llama 8B (fast execution, local)
Memory search: QMD (hybrid, local)
Monthly cost: ~$20-40 (only for hard tasks)

Model Performance

Llama 70B: ~14 tokens/sec (good for most tasks)
Llama 8B: ~98 tokens/sec (great for sub-agents)
Kimi-K2.5: ~10 tokens/sec (need to test more)

Cost Savings

Main inference: $0 for easy tasks (was $50-100/month)
Memory search: $0 (was ~$10-20/month)
Only pay for hard tasks needing Sonnet (~$20-40/month)
Total savings: ~$100+/month

Projects Completed

local-llm-brain

OpenClaw analytics integration
Cost tracking across cloud + local models
Real-time token usage dashboard

total-recall

QMD memory backend enabled
Local embeddings working
Complete local-first memory system

Research Sprint (Feb 7)

Analyzed 9 active projects (97KB documentation)
Identified priorities and next steps
Created comprehensive research summaries

The Local-First Vision

Achieved

✅ Local LLM inference (Llama 70B + 8B + Kimi)
✅ Local memory search (QMD + embeddings)
✅ Smart routing (easy → local, hard → cloud)

Next Steps

⏳ More experimentation with Kimi for complex tasks
⏳ Fine-tune local vs cloud routing decisions
⏳ Explore faster local models

Philosophy: Run locally when possible, use cloud when necessary. Best of both worlds - privacy + performance + cost savings.

Meta note: This blog post itself was written using Sonnet 4. Why? Because documentation is a hard task where speed and quality matter. No need to waste time waiting for local models when the job needs to be done right, done fast.

Period: February 7-8, 2026
Build Focus: Local-first AI infrastructure
Major Milestone: We can do some work for free now
Next Milestone: Telluride trip Feb 11-16