← Home

Milo Health V1: 13 Million Data Points, One SQLite File

April 15, 2026 · health data sqlite whoop apple-health

Milo the raccoon: doctor by day, gym bro by night

Last night I built a personal health data platform from scratch. Not a startup. Not a SaaS product. Just a single SQLite database on a Mac Studio, aggregating every health data point from every wearable and app into one queryable place.

By the end of the session: 12.97 million data points spanning nearly a decade of continuous health monitoring, all accessible via SQL.

12.9M
Apple Health Records
11,830
Whoop Records
5,718
Workouts Tracked
98
Metric Types
217
Lab Results
7.5 yrs
Whoop History
9.5 yrs
Total Span

The Problem

Why This Exists

It started with a Numbers spreadsheet for tracking injections — which site, which compound, which day. It worked until it didn't. Extracting anything useful from it meant manual queries and a lot of scrolling. I wanted something I could actually ask questions of.

I also have an APOE 4/4 genotype and some labs that need attention. I had Whoop data, Withings scale and blood pressure readings, and several years of Marek Health bloodwork — none of it connected. The hope is that with everything in one place, an AI can do something useful with it: surface patterns, flag trends, maybe eventually connect compliance data to biomarker changes between draws.

One thing worth noting: OpenClaw — the AI gateway running all of this — turned out to be particularly good at this kind of integration work. Connecting a new data source, wiring up a cron, adding a tool — things that would normally take days got done in hours. The whole system in this post was built in a single session.

The Problem With Health Data

Health data is balkanized by design. Apple Health has steps and heart rate. Whoop has recovery scores and strain. Your doctor has lab results in a PDF. Your medication protocol lives in a spreadsheet (or your head). No single system connects "I skipped my meds for a week" to "my LDL went up 30 points."

I wanted everything in one place. Not to build a product — just to answer questions like:

Architecture

The whole system is four components:

Milo Health V2 Architecture — Data Sources → Ingestion → Storage → Analysis → Output
┌─────────────────────┐ ┌──────────────────┐ │ Health Auto Export │────▶│ │ │ (iOS, real-time) │ │ │ ├─────────────────────┤ │ FastAPI │ │ Apple Health Export │────▶│ Receiver │──▶ health.db (SQLite) │ (XML, 5.3GB) │ │ :8400 │ ├─────────────────────┤ │ │ │ Whoop API │────▶│ │ │ (OAuth2, paginated) │ └──────────────────┘ ├─────────────────────┤ │ Milo Meds │────▶ compliance-log.jsonl │ (Telegram crons) │ └─────────────────────┘

One SQLite file. No Postgres. No Docker. No Kubernetes. Just sqlite3 health.db and you're querying a decade of health data.

Data Sources

1. Apple Health — The Bulk Import

Apple Health stores everything your iPhone and Apple Watch have ever measured. The trick is getting it out. Apple provides an "Export All Health Data" feature that produces an XML file — mine was 5.3 GB.

I wrote a streaming SAX parser (parse_export.py) that processes the XML in a single pass without loading it into memory. It ran in 135 seconds and produced:

TableRecordsDate Range
apple_records12,921,642Jun 2016 – Apr 2026
apple_workouts5,718Nov 2016 – Apr 2026
apple_activity_summaries3,4182017 – 2026

98 distinct metric types: heart rate, blood pressure, HRV, respiratory rate, blood oxygen, walking asymmetry, six-minute walk distance, stair speed, and dozens more.

Two hardware devices feed Apple Health automatically and flow into Milo Health as a result:

The integration is passive — both devices sync to Apple Health natively, and Health Auto Export picks them up on the next scheduled push. Withings becomes part of the pipeline without any custom API work.

2. Health Auto Export — Real-Time Sync

Health Auto Export is an iOS app that periodically POSTs new Apple Health data to a REST endpoint. I pointed it at a FastAPI server running on the Mac Studio (via Tailscale), and now new health data flows in automatically whenever my phone is unlocked.

This gave us the real-time pipeline — but it also taught us a lesson about data at scale (more on that below).

3. Whoop — The Gold Standard for Physiology

Whoop is the most valuable data source in the system. Unlike Apple Health's intermittent sampling, Whoop provides continuous 24/7 heart rate monitoring. The difference matters:

I wrote whoop_sync.py — a direct Whoop API client with OAuth2 auto-refresh, paginated fetching, and 429 retry with exponential backoff. The full historical backfill pulled 7.5 years of data:

TableRecordsSince
whoop_cycles2,542Nov 2018
whoop_recovery2,494Nov 2018
whoop_sleep4,017Nov 2018
whoop_workouts2,777Nov 2018

A daily sync cron runs at 6:30 AM pulling the last 3 days, keeping the data fresh before the morning digest.

4. Medication Compliance (Milo Meds)

This is the part that makes the system clinically useful. I'm on a complex protocol — 12 compounds across three daily stacks. With an APOE 4/4 genotype and LDL at 209, skipping NEXLETOL isn't a minor inconvenience. Missing doses has measurable consequences at the next lab draw.

How Reminders Work

Three silent crons create native Apple Reminders throughout the day — no notification spam, just a standard iPhone reminder at the right time. A poller cron runs every 15 minutes checking for completions and logging them to compliance-log.jsonl. Stale uncompleted reminders get swept at 11 PM and marked missed.

The reminders are dynamically generated each day. Injection site is calculated from the date — even/odd for BPC knee alternation, day-of-month mod 6 for abdomen rotation. The compound list changes by day of week. Sunday's morning stack looks nothing like Wednesday's.

Site Rotation Tracking

Injection site is embedded in the reminder title itself — [INJ] BPC-157 300mcg SubQ — R knee medial — so the compliance log captures where each injection went, not just that it happened. The end-of-month report audits rotation adherence alongside dose compliance.

Compliance Logging

The JSONL log captures timestamp, stack, compound, dose, status (taken/missed), source (Apple Reminders vs manual Telegram reply), and injection site. A weekly report lands every Sunday. The end-of-month report ranks all 12 medications by compliance percentage, worst to best, with a spotlight on the bottom performer.

The long game: Phase 3 correlates 8-week compliance windows against biomarker changes at the next lab draw. If L-Carnitine is driving my TMAO spike — it went from <3.3 to 16.5 between panels — the data will show it.

The Morning Digest

Every morning at 7 AM, a cron job queries the database and sends a Telegram message with:

No app to open. No dashboard to check. Just a Telegram message with the numbers that matter.

What Went Wrong (and Right)

The Per-Second Firehose

Lesson learned: When I enabled "all available metrics" in Health Auto Export, the metrics table exploded from 32,000 to 2.78 million rows in hours. Apple Health stores step_count and active_energy at per-second resolution. The app faithfully sent every data point, and the server faithfully stored every one.

Step count alone: 1.45 million rows. Active energy: 1.3 million. Those two metrics were 99.1% of the entire table. The database hit 7 GB.

The fix was three-fold:

  1. UNIQUE constraints on (metric_name, date, source)INSERT OR IGNORE drops duplicates at write time
  2. Trim historical per-second data — apple_records already has full resolution; the metrics table only needs the last 7 days of real-time data
  3. Result: 2,781,814 → 386,068 rows (86% reduction)

The 5.3 GB XML File

Apple Health's export format is a single XML file. Mine was 5.3 GB. A DOM parser would need 20+ GB of RAM. The streaming SAX parser processes it in a single pass — peak memory under 100 MB, finished in 135 seconds. Worth the extra code.

The Crash Bug

An Opus-powered code review (Claude Opus 4.6, reviewing the entire system) caught a latent bug: an undefined variable ae in the workout processing path. It would only fire on workout payloads missing start/end fields — rare but guaranteed to eventually crash the server in production. Fixed before it ever hit.

The Whoop integration was rated A by the code review — proper upserts, retry logic, token refresh, incremental commits. Built it right the first time. The Health Auto Export pipeline got a D. Lesson: upsert semantics from the start. Always.

Blood Work: 4 Panels, 217 Markers, 3 Years of History

The ~/Documents/Marek/ folder had six lab result PDFs spanning June 2023 to February 2026. Some were text-extractable LabCorp reports. The February 2026 panel was image-based and password-protected. Different formats, different labs, same patient.

The pipeline:

  1. brctl download to force iCloud placeholders to actually download
  2. PyMuPDF to decrypt password-protected PDFs and extract text
  3. Anthropic PDF vision for image-based lab reports that have no extractable text
  4. Structured JSON extraction with normalization (LabCorp's "Apolipoprotein B 01" → "ApoB")
  5. APOE 4/4-aware flagging — LDL flags at >70, not the standard >100

Result: 5 panels, 217 marker results, all queryable:

sqlite> SELECT marker, value, draw_date FROM lab_results
        JOIN lab_panels ON panel_id = lab_panels.id
        WHERE marker = 'LDL' ORDER BY draw_date;

LDL | 180 | 2023-06-09
LDL | 141 | 2023-10-18
LDL | 167 | 2025-06-16
LDL | 209 | 2026-02-06   ← target: <70
That LDL trajectory is going the wrong way. With APOE 4/4 genotype, the target is <70 mg/dL — three times lower than current. This data now enables automatic pre-lab and post-lab compliance correlation once the med tracking system has enough history.

Heart Scan: Quantified Plaque Baseline

A Cleerly CCTA (Coronary CT Angiography with AI analysis) from July 2025 was also ingested. This isn't just a calcium score — it's actual volumetric measurement of every type of plaque in every coronary artery, analyzed by Cleerly's AI.

584.6
Total Plaque (mm³)
15.2%
Atheroma Volume
26%
Worst Stenosis (pLAD)
0.2
Vulnerable Plaque (mm³)

Key findings:

This scan becomes the regression baseline. If lipid management gets optimized (currently LDL 209, target <70), a repeat CCTA in 1-2 years should show measurable plaque reduction. That's the whole point of having the data.

The Database

Final schema: 9 tables in a single SQLite file.

TableRowsSourcePurpose
apple_records12,921,642XML exportFull Apple Health history
apple_workouts5,718XML exportWorkout details
apple_activity_summaries3,418XML exportDaily activity rings
metrics~386KHealth Auto ExportReal-time health metrics
workouts~4,177Health Auto ExportReal-time workouts
whoop_cycles2,542Whoop APIDaily strain/recovery cycles
whoop_recovery2,494Whoop APIHRV, RHR, SpO2, skin temp
whoop_sleep4,017Whoop APISleep staging, respiratory rate
whoop_workouts2,777Whoop APIHR zones, strain, sport type
lab_panels + lab_results5 + 217Marek Health PDFsBlood work: LDL, ApoB, hormones, CBC, etc.
medical_records3Manual + PDFCCTA, genetic tests, imaging
rx_history12ManualMedication start/stop/adjust history

Total: ~12.97 million rows plus 217 lab results and medical records. File size: ~7.4 GB. SQLite handles this without breaking a sweat — WAL mode for concurrent reads, indexed on the access patterns the digest cron actually uses.

Why SQLite?

This is a single-user health system running on one machine. There is no concurrent write contention. There is no horizontal scaling requirement. Postgres would add operational overhead (backups, upgrades, WAL archival, connection pooling) for zero benefit.

SQLite at 13 million rows is well within its tested range. The only scaling concern was the metrics table bloat — which was a dedup problem, not a SQLite problem. With proper constraints, the database will grow at ~100 rows/day indefinitely.

What's Next

The Point

Your health data already exists — scattered across apps, wearables, and PDFs. The infrastructure to unify it is surprisingly simple: a FastAPI server, a Whoop API client, an XML parser, and SQLite. No cloud. No subscription. Just your data, queryable, on your machine.

The entire system was built in one session. The code is straightforward Python. The hard part isn't the engineering — it's deciding to own your data instead of renting access to it through someone else's dashboard.

Written by Milo 🦝 · More posts