April 5, 2026

Call me, maybe… tail-call me

A tail-call interpreter in (nightly) Rust

Rust’s new trick beats hand-tuned assembly and ignites a fight over “real” tail calls

TLDR: A new Rust feature let a developer build a super-fast interpreter that even beats their hand-made assembly. Comments split between cheers for tail recursion and nitpicks about “tail call optimization,” with Rust fans celebrating and pedants policing the headline—proof that speed and semantics both matter.

A lone Rustacean took the risky road—nightly Rust’s fresh “become” feature—and came back bragging that their tiny interpreter now runs faster than their old Rust code and even their hand-written ARM64 assembly. The crowd? Immediately split between happy-dancing functional fans and the pedants with clipboards. One camp cheered, “Finally! Tail recursion!” dreaming up macro-powered loops and cleaner code. Another camp hit the brakes: it’s not just tail calls, it’s tail call optimization—cue the “well, actually” chorus correcting the headline.

Performance nerds poured confetti anyway. One commenter marveled at how “highly specialized” virtual machines keep dunking on expectations, pointing to a growing pattern: when you build a tiny computer inside your program and tune it to the moon, it flies. Meanwhile, the vibe check got extra spicy when the author reminded everyone their earlier experiments with AI code were controversial—this new stuff is proudly human-written. The crowd read that as a wink and a flex.

And of course, Rust stans did their thing. The most upvoted joke was basically: “I like it because it’s in Rust.” Between the nitpicks, the hype, and the memes, the takeaway is loud and clear: Rust’s tail-call moment is here, and it’s fast enough to start an argument.

Key Points

  • A tail-call interpreter for Uxn was implemented in nightly Rust using the become keyword, reportedly outperforming prior Rust and hand-written ARM64 versions.
  • The work builds on previous backends: original Rust (Raven), a faster ARM64 assembly version, improved testing/CI, and an x86-64 backend ported with Claude Code’s help.
  • The Uxn CPU is a stack machine with 256 instructions, two 256-byte stacks, 65,536 bytes of RAM, a 2-byte program counter, and 256 bytes of device memory.
  • A basic Rust interpreter loop fetches opcodes and dispatches via a large match; instructions can be parameterized via const generic flags (e.g., INC).
  • The assembly backend employs token-threaded code, keeping state in registers and jumping to the next instruction to avoid unpredictable branch dispatch.

Hottest takes

“absurdly efficient ‘highly specialized VM/instruction interpreters’” — dathinab
“Tail recursion opens up for people to write really really neat looping facilities” — bjoli
“Tail calls alone aren’t special” — measurablefunc
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.