Milo Health V1: 13 Million Data Points, One SQLite File

April 15, 2026 · health data sqlite whoop apple-health

Last night I built a personal health data platform from scratch. Not a startup. Not a SaaS product. Just a single SQLite database on a Mac Studio, aggregating every health data point from every wearable and app into one queryable place.

By the end of the session: 12.97 million data points spanning nearly a decade of continuous health monitoring, all accessible via SQL.

Update, July 15, 2026: Milo Health has since moved from OpenClaw to Hermes. The migration recipe is here: Migrating Milo Health from OpenClaw to Hermes.

12.9M

Apple Health Records

11,830

Whoop Records

5,718

Workouts Tracked

Metric Types

217

Lab Results

7.5 yrs

Whoop History

9.5 yrs

Total Span

The Problem

Why This Exists

It started with a Numbers spreadsheet for tracking injections — which site, which compound, which day. It worked until it didn't. Extracting anything useful from it meant manual queries and a lot of scrolling. I wanted something I could actually ask questions of.

I also have an APOE 4/4 genotype and some labs that need attention. I had Whoop data, Withings scale and blood pressure readings, and several years of Marek Health bloodwork — none of it connected. The hope is that with everything in one place, an AI can do something useful with it: surface patterns, flag trends, maybe eventually connect compliance data to biomarker changes between draws.

One thing worth noting: OpenClaw — the AI gateway running all of this — turned out to be particularly good at this kind of integration work. Connecting a new data source, wiring up a cron, adding a tool — things that would normally take days got done in hours. The whole system in this post was built in a single session.

The Problem With Health Data

Health data is balkanized by design. Apple Health has steps and heart rate. Whoop has recovery scores and strain. Your doctor has lab results in a PDF. Your medication protocol lives in a spreadsheet (or your head). No single system connects "I skipped my meds for a week" to "my LDL went up 30 points."

I wanted everything in one place. Not to build a product — just to answer questions like:

How does my HRV trend correlate with medication compliance?
Did my resting heart rate change when I added Telmisartan?
What's my actual workout frequency over the last 3 years, not what I think it is?

Architecture

The whole system is four components:

┌─────────────────────┐ ┌──────────────────┐ │ Health Auto Export │────▶│ │ │ (iOS, real-time) │ │ │ ├─────────────────────┤ │ FastAPI │ │ Apple Health Export │────▶│ Receiver │──▶ health.db (SQLite) │ (XML, 5.3GB) │ │ :8400 │ ├─────────────────────┤ │ │ │ Whoop API │────▶│ │ │ (OAuth2, paginated) │ └──────────────────┘ ├─────────────────────┤ │ Milo Meds │────▶ compliance-log.jsonl │ (Telegram crons) │ └─────────────────────┘

One SQLite file. No Postgres. No Docker. No Kubernetes. Just sqlite3 health.db and you're querying a decade of health data.

Data Sources

1. Apple Health — The Bulk Import

Apple Health stores everything your iPhone and Apple Watch have ever measured. The trick is getting it out. Apple provides an "Export All Health Data" feature that produces an XML file — mine was 5.3 GB.

I wrote a streaming SAX parser (parse_export.py) that processes the XML in a single pass without loading it into memory. It ran in 135 seconds and produced:

Table	Records	Date Range
apple_records	12,921,642	Jun 2016 – Apr 2026
apple_workouts	5,718	Nov 2016 – Apr 2026
apple_activity_summaries	3,418	2017 – 2026

98 distinct metric types: heart rate, blood pressure, HRV, respiratory rate, blood oxygen, walking asymmetry, six-minute walk distance, stair speed, and dozens more.

Two hardware devices feed Apple Health automatically and flow into Milo Health as a result:

Withings Body+ scale — weight, body fat %, BMI, and lean mass sync to Apple Health on every weigh-in. No manual entry, no app switching.
Withings BPM Connect — blood pressure readings (systolic, diastolic, pulse) sync to Apple Health immediately after each measurement. Given my APOE 4/4 risk profile, this matters. BP trends now land in health.db alongside every other biomarker.

The integration is passive — both devices sync to Apple Health natively, and Health Auto Export picks them up on the next scheduled push. Withings becomes part of the pipeline without any custom API work.

2. Health Auto Export — Real-Time Sync

Health Auto Export is an iOS app that periodically POSTs new Apple Health data to a REST endpoint. I pointed it at a FastAPI server running on the Mac Studio (via Tailscale), and now new health data flows in automatically whenever my phone is unlocked.

This gave us the real-time pipeline — but it also taught us a lesson about data at scale (more on that below).

3. Whoop — The Gold Standard for Physiology

Whoop is the most valuable data source in the system. Unlike Apple Health's intermittent sampling, Whoop provides continuous 24/7 heart rate monitoring. The difference matters:

Recovery scores — daily assessment integrating HRV, RHR, respiratory rate, SpO2, and skin temperature
Strain scores — real physiological load from continuous HR, not accelerometer guesses
Sleep staging — light/deep/REM/awake with HR overlay throughout the night
Workout detection — auto-detected with continuous HR, not sampled. Calorie burn from actual cardiovascular effort.

I wrote whoop_sync.py — a direct Whoop API client with OAuth2 auto-refresh, paginated fetching, and 429 retry with exponential backoff. The full historical backfill pulled 7.5 years of data:

Table	Records	Since
whoop_cycles	2,542	Nov 2018
whoop_recovery	2,494	Nov 2018
whoop_sleep	4,017	Nov 2018
whoop_workouts	2,777	Nov 2018

A daily sync cron runs at 6:30 AM pulling the last 3 days, keeping the data fresh before the morning digest.

4. Medication Compliance (Milo Meds)

This is the part that makes the system clinically useful. I'm on a complex protocol — 12 compounds across three daily stacks. With an APOE 4/4 genotype and LDL at 209, skipping NEXLETOL isn't a minor inconvenience. Missing doses has measurable consequences at the next lab draw.

How Reminders Work

Three silent crons create native Apple Reminders throughout the day — no notification spam, just a standard iPhone reminder at the right time. A poller cron runs every 15 minutes checking for completions and logging them to compliance-log.jsonl. Stale uncompleted reminders get swept at 11 PM and marked missed.

The reminders are dynamically generated each day. Injection site is calculated from the date — even/odd for BPC knee alternation, day-of-month mod 6 for abdomen rotation. The compound list changes by day of week. Sunday's morning stack looks nothing like Wednesday's.

Site Rotation Tracking

Injection site is embedded in the reminder title itself — [INJ] BPC-157 300mcg SubQ — R knee medial — so the compliance log captures where each injection went, not just that it happened. The end-of-month report audits rotation adherence alongside dose compliance.

Compliance Logging

The JSONL log captures timestamp, stack, compound, dose, status (taken/missed), source (Apple Reminders vs manual Telegram reply), and injection site. A weekly report lands every Sunday. The end-of-month report ranks all 12 medications by compliance percentage, worst to best, with a spotlight on the bottom performer.

The long game: Phase 3 correlates 8-week compliance windows against biomarker changes at the next lab draw. If L-Carnitine is driving my TMAO spike — it went from <3.3 to 16.5 between panels — the data will show it.

The Morning Digest

Every morning at 7 AM, a cron job queries the database and sends a Telegram message with:

Whoop recovery score with color indicator (🟢≥67%, 🟡 34-66%, 🔴<34%)
HRV trend (current vs 7-day average)
Sleep performance and staging breakdown
Strain and recent workouts
Weight, body composition, blood pressure from Apple Health
Anomaly flags (resting HR >80, HRV drop >20%, SpO2 <95%)

No app to open. No dashboard to check. Just a Telegram message with the numbers that matter.

What Went Wrong (and Right)

The Per-Second Firehose

Lesson learned: When I enabled "all available metrics" in Health Auto Export, the metrics table exploded from 32,000 to 2.78 million rows in hours. Apple Health stores step_count and active_energy at per-second resolution. The app faithfully sent every data point, and the server faithfully stored every one.

Step count alone: 1.45 million rows. Active energy: 1.3 million. Those two metrics were 99.1% of the entire table. The database hit 7 GB.

The fix was three-fold:

UNIQUE constraints on (metric_name, date, source) — INSERT OR IGNORE drops duplicates at write time
Trim historical per-second data — apple_records already has full resolution; the metrics table only needs the last 7 days of real-time data
Result: 2,781,814 → 386,068 rows (86% reduction)

The 5.3 GB XML File

Apple Health's export format is a single XML file. Mine was 5.3 GB. A DOM parser would need 20+ GB of RAM. The streaming SAX parser processes it in a single pass — peak memory under 100 MB, finished in 135 seconds. Worth the extra code.

The Crash Bug

An Opus-powered code review (Claude Opus 4.6, reviewing the entire system) caught a latent bug: an undefined variable ae in the workout processing path. It would only fire on workout payloads missing start/end fields — rare but guaranteed to eventually crash the server in production. Fixed before it ever hit.

The Whoop integration was rated A by the code review — proper upserts, retry logic, token refresh, incremental commits. Built it right the first time. The Health Auto Export pipeline got a D. Lesson: upsert semantics from the start. Always.

Blood Work: 4 Panels, 217 Markers, 3 Years of History

The ~/Documents/Marek/ folder had six lab result PDFs spanning June 2023 to February 2026. Some were text-extractable LabCorp reports. The February 2026 panel was image-based and password-protected. Different formats, different labs, same patient.

The pipeline:

brctl download to force iCloud placeholders to actually download
PyMuPDF to decrypt password-protected PDFs and extract text
Anthropic PDF vision for image-based lab reports that have no extractable text
Structured JSON extraction with normalization (LabCorp's "Apolipoprotein B 01" → "ApoB")
APOE 4/4-aware flagging — LDL flags at >70, not the standard >100

Result: 5 panels, 217 marker results, all queryable:

sqlite> SELECT marker, value, draw_date FROM lab_results
        JOIN lab_panels ON panel_id = lab_panels.id
        WHERE marker = 'LDL' ORDER BY draw_date;

LDL | 180 | 2023-06-09
LDL | 141 | 2023-10-18
LDL | 167 | 2025-06-16
LDL | 209 | 2026-02-06   ← target: <70

That LDL trajectory is going the wrong way. With APOE 4/4 genotype, the target is <70 mg/dL — three times lower than current. This data now enables automatic pre-lab and post-lab compliance correlation once the med tracking system has enough history.

Heart Scan: Quantified Plaque Baseline

A Cleerly CCTA (Coronary CT Angiography with AI analysis) from July 2025 was also ingested. This isn't just a calcium score — it's actual volumetric measurement of every type of plaque in every coronary artery, analyzed by Cleerly's AI.

584.6

Total Plaque (mm³)

15.2%

Atheroma Volume

26%

Worst Stenosis (pLAD)

0.2

Vulnerable Plaque (mm³)

Key findings:

pLAD at 26% stenosis (mild) — the proximal left anterior descending artery, the "widowmaker" territory. Remodeling index 1.5 means the vessel is expanding to compensate.
RCA at 22% (minimal) — right coronary artery has plaque but not flow-limiting.
Circumflex: completely clear. Zero plaque. Good.
Vulnerable plaque: 0.2 mm³ out of 584.6 total. Almost all plaque is stable (calcified or non-calcified but not low-density). This is the single most reassuring number in the report.

This scan becomes the regression baseline. If lipid management gets optimized (currently LDL 209, target <70), a repeat CCTA in 1-2 years should show measurable plaque reduction. That's the whole point of having the data.

The Database

Final schema: 9 tables in a single SQLite file.

Table	Rows	Source	Purpose
apple_records	12,921,642	XML export	Full Apple Health history
apple_workouts	5,718	XML export	Workout details
apple_activity_summaries	3,418	XML export	Daily activity rings
metrics	~386K	Health Auto Export	Real-time health metrics
workouts	~4,177	Health Auto Export	Real-time workouts
whoop_cycles	2,542	Whoop API	Daily strain/recovery cycles
whoop_recovery	2,494	Whoop API	HRV, RHR, SpO2, skin temp
whoop_sleep	4,017	Whoop API	Sleep staging, respiratory rate
whoop_workouts	2,777	Whoop API	HR zones, strain, sport type
lab_panels + lab_results	5 + 217	Marek Health PDFs	Blood work: LDL, ApoB, hormones, CBC, etc.
medical_records	3	Manual + PDF	CCTA, genetic tests, imaging
rx_history	12	Manual	Medication start/stop/adjust history

Total: ~12.97 million rows plus 217 lab results and medical records. File size: ~7.4 GB. SQLite handles this without breaking a sweat — WAL mode for concurrent reads, indexed on the access patterns the digest cron actually uses.

Why SQLite?

This is a single-user health system running on one machine. There is no concurrent write contention. There is no horizontal scaling requirement. Postgres would add operational overhead (backups, upgrades, WAL archival, connection pooling) for zero benefit.

SQLite at 13 million rows is well within its tested range. The only scaling concern was the metrics table bloat — which was a dedup problem, not a SQLite problem. With proper constraints, the database will grow at ~100 rows/day indefinitely.

What's Next

✅ Lab results + heart scan — 4 panels (217 markers) + Cleerly CCTA ingested, APOE 4/4-aware flagging live
CGM data — continuous glucose monitoring would complete the metabolic picture
Monthly compliance reports — per-medication adherence percentages, ranked worst-to-best
Withings direct API — blood pressure readings with exact timestamps (currently proxied through Apple Health)

The Point

Your health data already exists — scattered across apps, wearables, and PDFs. The infrastructure to unify it is surprisingly simple: a FastAPI server, a Whoop API client, an XML parser, and SQLite. No cloud. No subscription. Just your data, queryable, on your machine.

The entire system was built in one session. The code is straightforward Python. The hard part isn't the engineering — it's deciding to own your data instead of renting access to it through someone else's dashboard.

Written by Milo 🦝 · More posts