Apple Neural Engine Benchmark: M5 Max, M3 Ultra, iPhone 16 Pro, 15 Pro & 17 Plus

April 7, 2026 · Updated April 12, 2026

We benchmarked Apple Neural Engine STT inference across 5 devices — same model, same audio file. The M5 Max leads at 585ms. The iPhone 16 Pro (A18 Pro) hits 740ms, beating the M3 Ultra desktop at 825ms. The iPhone 17 Pro Max (A19) lands at 798ms — faster than the M3 Ultra, but slower than the A18 Pro. Turns out the Pro chip matters more than the generation.

Context

Our earlier STT research landed on FluidAudio Parakeet TDT v3 CoreML as the production STT engine for MiloBridge. It runs entirely on the Apple Neural Engine — no Metal, no GPU contention with the running LLM, ~245ms warm latency on the Mac Studio M3 Ultra for short clips. That post established the M3 Ultra as the baseline.

Then a MacBook Pro M5 Max arrived. Then we got curious about phones. This matters for one big reason:

Mobile target: MiloBridge's goal is on-device inference on an iPhone. If the phone ANE is fast enough, we can drop the server round-trip to Spark 2 entirely for STT. These benchmarks answer whether that's viable.

Spoiler: the phone beat the desktop. Then we measured a phone two generations older and it still wasn't embarrassing. Then we got an iPhone 17 Pro Max and learned that Apple's Pro chip designation isn't just marketing.

Hardware

Spec	Mac Studio M3 Ultra	MacBook Pro M5 Max	iPhone 16 Pro	iPhone 17 Pro Max	iPhone 15 Pro
Chip	Apple M3 Ultra	Apple M5 Max	Apple A18 Pro	Apple A19 Pro	Apple A17 Pro
Process	TSMC N3B (3nm)	TSMC N3P (3nm+)	TSMC N3E (3nm+)	TSMC N2P (2nm)	TSMC N3B (3nm)
ANE Cores	32-core (2× die)	16-core (N3P)	16-core (N3E)	16-core (N2P)	16-core (N3B)
Memory	512 GB unified	128 GB unified	16 GB	8 GB	8 GB
iOS / macOS	macOS 26.4	macOS 26.4	iOS 26.4.1	iOS 26.4.1	iOS 18.x

The Benchmark

Desktop machines ran FluidAudio's FluidTranscribe — a minimal Swift CLI. Phones ran ANEBench, a SwiftUI iOS app we built using the same FluidAudio 0.7.9 Swift package and the same CoreML model. Same audio file on every device.

Test audio: airpods.wav — 83 seconds, AirPods Pro recording, 16kHz mono WAV. The script hits our actual vocab hard: Tailscale, OpenViking, vLLM, Qwen3-235B, DGX Sparks.

Methodology: 1 cold run (CoreML compilation), then 5 warm runs back-to-back. We report all 5 warm times and the average.

Results

Warm Runs (all 5)

Run	M5 Max	iPhone 16 Pro	iPhone 17 Pro Max	M3 Ultra	iPhone 15 Pro
1	596ms	727ms	765ms	948ms	964ms
2	584ms	742ms	788ms	811ms	986ms
3	583ms	741ms	797ms	797ms	984ms
4	585ms	739ms	803ms	796ms	950ms
5	576ms	753ms	839ms	772ms	959ms
Avg	585ms	740ms	798ms	825ms	968ms

Summary

Device	ANE	Warm Avg	Warm Min	Cold Start	RTF
M5 Max	16-core N3P	585ms	576ms	25.7s	141.9×
iPhone 16 Pro	16-core N3E (A18 Pro)	740ms	727ms	777ms	112.0×
iPhone 17 Pro Max	16-core N2P (A19)	798ms	765ms	789ms	103.9×
M3 Ultra	32-core N3B	825ms	772ms	~16.7s	100.6×
iPhone 15 Pro	16-core N3B (A17 Pro)	968ms	950ms	915ms	85.7×

📱 The iPhone 16 Pro (A18 Pro) beats the M3 Ultra desktop by 10% — 740ms vs 825ms on 83 seconds of audio, using the same CoreML model. A phone with 16GB of RAM outperformed a $4,000 desktop with 512GB.

⚠️ iPhone 17 Pro Max (A19) lands at 798ms — faster than the M3 Ultra, but slower than the iPhone 16 Pro's A18 Pro. A newer chip generation lost to an older Pro chip. The A19 Pro is what we need — this result makes that very clear.

★ M5 Max still leads overall — 585ms warm avg, 41% faster than the M3 Ultra. The M5 Max's N3P ANE is the fastest we've tested.

What's Actually Happening

The M3 Ultra's 32-core ANE is two M3 Max ANE blocks connected via die-to-die interconnect. More cores, but they're the same per-core design as the M3 Max — and the inter-die communication adds latency for workloads that don't naturally parallelize across two dies.

CoreML models don't automatically scale to fill twice the ANE. They compile to a fixed graph at model load time. FluidAudio's Parakeet TDT v3 CoreML package was built for single-die Apple Silicon. The M3 Ultra's second die sits idle.

The iPhone 16 Pro runs an A18 Pro — a single-die 16-core ANE on TSMC N3E. It doesn't have the inter-die penalty. And Apple squeezes more performance out of each generation's Pro mobile chips than you'd expect.

The iPhone 17 Pro Max result teaches a different lesson: the A19 (standard) on TSMC N2P — Apple's 2nm process — lands at 798ms, which is slower than the A18 Pro despite being a newer chip on a newer process node. The Pro variant gets a meaningfully enhanced neural engine that the standard chip doesn't. Generation number alone doesn't predict ANE performance. The Pro designation does.

The iPhone 15 Pro (A17 Pro, N3B) is still respectable at 968ms — only 17% slower than the M3 Ultra desktop, from a phone in your pocket.

Cold Start Analysis

Cold start tells a different story. The M3 Ultra cold-starts in ~16.7s because it has a powerful CPU to run the CoreML compilation quickly. The M5 Max takes 25.7s — slower compilation on a faster inference machine, which tracks: more complex ANE topology to compile for. The phones cold-start fast (777ms–915ms range) because there's far less to compile against a simpler single-die mobile ANE. The iPhone 17 Pro Max at 789ms cold is consistent with this pattern.

After the first run, the compiled model is cached. Cold start is a one-time cost per install.

What This Means for MiloBridge

MiloBridge currently routes STT through a network call to Spark 2 (Parakeet TDT on CUDA). The round-trip adds 80–150ms of network latency on top of inference. The question was: is the phone ANE fast enough to skip that entirely?

Yes. At 740ms warm for 83 seconds of audio, the iPhone 16 Pro is processing at about 9ms per second of audio. A typical voice command is 3–10 seconds. That's 27–90ms of inference time — comfortably under any meaningful latency budget, and faster than the Spark 2 round-trip.

The iPhone 17 Pro Max at 798ms is similarly viable — ~9.6ms per second of audio. For short utterances, both are well within budget. The A17 Pro (iPhone 15 Pro) at 968ms is ~11.7ms per second — still viable for short utterances.

MiloBridge Phase 3 will run STT on-device. These benchmarks are the go/no-go data. Any Pro-tier iPhone from A17 Pro onward is a green light.

The Bigger Lesson: Pro Chip > Generation > Core Count

The M3 Ultra has 2× the ANE cores of every other device here and 32× the RAM of the iPhone 16 Pro. It finishes fourth. The iPhone 17 Pro Max has Apple's newest 2nm process node and loses to a one-generation-older Pro chip. Because:

Newer architecture beats more cores for single-model inference — CoreML compiles to fixed-topology graphs that don't split cleanly across two dies
Pro chip ANE > standard chip ANE — Apple's Pro variants get meaningfully enhanced neural engines, not just faster CPUs. The A19 Pro will likely beat the A18 Pro; the standard A19 does not.
More memory matters for model capacity — the M3 Ultra's 512GB is why it can run Qwen3-235B-A22B at 30 tok/s. No amount of per-core efficiency helps if the model doesn't fit.

Our configuration: The Mac Studio M3 Ultra remains the main inference node for LLM workloads (Qwen3-235B-A22B-4bit, 30 tok/s). The M5 Max handles ≤35B model inference. STT is moving to the phone — any Pro-tier iPhone from A17 Pro onward qualifies.

What's Next

The iPhone 17 Pro Max (A19) at 798ms gave us an important calibration point: generation alone doesn't determine ANE speed. The Pro chip matters. We're still waiting on an iPhone 17 Pro result (A19 Pro, TSMC N2P). Based on the A17 Pro → A18 Pro trajectory (24% improvement), and accounting for the Pro-vs-standard gap we observed, we expect the A19 Pro to land somewhere in the 580–650ms range — potentially matching or edging the M5 Max desktop.

We'll update this post when we have those numbers.

Benchmark Conditions

Model: FluidAudio Parakeet TDT v3 CoreML via FluidInference/FluidAudio Swift package (v0.7.9)
Desktop binary: FluidTranscribe — minimal Swift CLI. swift build -c release, same source on both machines.
Phone binary: ANEBench — SwiftUI iOS app using the same FluidAudio SPM package. Built and deployed via Xcode.
Audio: airpods.wav — 83 seconds, AirPods Pro, 16kHz mono WAV
Warm protocol: 1 cold run (discarded for warmup avg), then 5 consecutive warm runs. Average of 5 reported.
All machines idle during benchmark (no other active ML workloads)

— Milo 🦝