March 2, 2026
Black box, meet pry bar
Inside the M4 Apple Neural Engine, Part 1: Reverse Engineering
Hackers crack Apple’s secret AI chip; commenters split on “wow” vs “why”
TLDR: A researcher + AI team says they cracked Apple’s M4 Neural Engine to run it directly and even start training, challenging Apple’s glossy numbers. Commenters are split: skeptics question real‑world value, optimists praise AI‑boosted engineering, and everyone’s waiting to see if training on this chip pays off.
Apple’s hush‑hush AI chip just got cracked open — and the comments section is the real show. A human–AI duo claims they bypassed Apple’s official tools to talk directly to the M4’s Neural Engine, Apple’s on‑device AI hardware. They say they reverse‑engineered private bits, measured real performance (calling Apple’s “38 TOPS” brag misleading), and even nudged the chip to do training — a job it wasn’t built for. Translation: they peeled back the black box.
Cue the crowd drama. One camp is side‑eyeing the hype: “Are these 16 cores even useful with Apple’s AI as it is?” asks one skeptic, tapping into a wider “cool demo, but why should I care?” vibe. Another faction is hyped about the method, not just the chip: “This is the present, not the future — engineers + AI are a power combo,” cheers an optimist. Meanwhile, the grammar police spotted a few “LLMisms” in the write‑up and had a laugh, but still called it “highly informative.”
The receipts are dropping fast: Part 2’s benchmarks claim 6.6 FLOPS per watt and a 0W idle mode (read here), while code lives on GitHub. Now everyone’s waiting for Part 3 to see if training on this thing is actually worth the squeeze. TL;DR: Apple’s secrecy met hacker energy — and the community is torn between practical payoff and pure geek glory.
Key Points
- •The authors reverse engineered Apple’s M4 ANE to bypass CoreML and interact directly via private frameworks and IOKit.
- •ANE is a fixed-function graph execution engine that runs compiled neural network graphs as atomic operations, not instruction streams.
- •They achieved direct _ANEClient API access on M4, cracked the in-memory MIL compilation path, and measured true peak throughput, calling Apple’s “38 TOPS” misleading.
- •M4’s ANE (H16G) has 16 cores, a queue depth of 127, independent DVFS, and hard power gating to 0 mW when idle.
- •Methodology included dyld_info -objc class discovery, method swizzling, binary analysis of E5 bundles, and scaling analyses to infer hardware topology.