Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s

A Swift developer decided to do something delightfully chaotic: build the number-crunching heart of an AI model by hand, skip the usual machine-learning toolkits, and then try to make it faster than C, the old-school language many programmers treat like sacred scripture. The project is about speeding up matrix multiplication — basically the repeated arithmetic that powers AI training — on Apple chips, from the regular processor to the graphics hardware. But the real party is in the replies, where readers are treating this like a comeback tour, a hardware gossip thread, and a tiny programming flame war all at once.

The biggest mood? Respect mixed with nerdy disbelief. One commenter called the piece “phenomenal,” while others loved seeing serious Swift performance writing in a world where that kind of deep-dive is rare. There’s also a big nostalgia wave: longtime readers were thrilled to see Cocoa with Love creator Matt Gallagher still dropping dense, high-quality posts like it’s a greatest-hits album. But of course, this is the internet, so the applause quickly turned into debate. One crowd jumped on the author’s claim of about 1.1 trillion calculations per second and basically said, “Cute number, but real graphics-chip speed is way messier than the marketing stickers suggest.” Another commenter zoomed in on a compiler setting and delivered the classic programmer scolding: yes, AI math can get away with looser rules, but please do not normalize dangerous shortcuts for everyone else. Even the semi-secret Apple math hardware got dragged into the drama, with commenters wondering whether the company is hiding special tricks in plain sight. In other words: one man optimized Swift, and the community turned it into a full-on performance, purity, and prestige discourse.

May 11, 2026

Fast code, hotter comments

Swift tries to outrun C, and the comments are absolutely eating it up

Key Points

Hottest takes

May 11, 2026

Fast code, hotter comments

Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s

Swift tries to outrun C, and the comments are absolutely eating it up

Key Points

Hottest takes

Save News