Emacs internals: Tagged pointers vs. C++ std:variant and LLVM (Part 3)

Emacs’ pointer-tag magic vs C++ safety talk — the comments go feral

TLDR: Emacs shows how it squeezes many data types into one value using pointer tags, sparking a showdown with C++ fans of std::variant and LLVM’s template tricks. Commenters argue speed and memory vs safety and clarity—an essential trade-off for anyone building fast, reliable tools.

Emacs just showed off its old-school party trick: stuffing different kinds of data into one 64-bit value by using the spare bits on a pointer as a tag. The article compares that to C++’s std::variant (a safer, one-size-fits-all container) and even nods to LLVM’s template wizardry. The crowd? Split between team performance and team safety. One side cheers the elegance and memory savings of tagged pointers; the other clutches seatbelts, warning that C++ compilers won’t save you if you fly too close to the undefined-behavior sun.

A cautious voice asks how you even “spell” this trick legally in C++, while another fan waves the LLVM flag and drops receipts on CRTP, a way to get polymorphism (many shapes of code) without runtime costs. Meanwhile, someone resurrects the thread from the prequel with a link, proving this saga is officially a series. Jokes land hard: “Emacs can run your toaster, of course it can tag pointers,” one quips, as C++ purists gasp, “not on my watch.” Another meme: boxed vs unboxed becomes “Team Moving Boxes vs Team Carry-On Only.”

Bottom line: Emacs’ tagged pointers are fast and tiny; std::variant is safer but chunky; LLVM’s approach is nerd-chic. The comments are the real show—equal parts fascinated, frightened, and funny.

Key Points

  • Emacs stores every Lisp value in a 64-bit Lisp_Object, using the lowest three pointer bits as type tags due to 8-byte alignment.
  • Dynamic polymorphism can be implemented with tagged unions; C++17’s std::variant/std::visit provide checks to prevent invalid casts.
  • Unboxed tagged unions (like std::variant) allocate space equal to the largest alternative, which can be memory-inefficient for disparate sizes.
  • Boxed representations keep data on the heap and can use tagged pointers to encode type in low pointer bits, enabling up to 8 types without extra inline space.
  • When more than eight type categories are needed, fat pointers are a common solution, as seen in Go interfaces and Rust trait objects.

Hottest takes

It's not clear to me (and as an unsafe language it's not called out by your compiler if you do something illegal) what the correct way to spell this kind of trick is in C++ — tialaramex
Happy to see discussion of LLVM's interesting implementation of Static Polymorphism using CRTP — ndesaulniers
Emacs internal part 2 HN link: https://news.ycombinator.com/item?id=47259961 — thecloudlet
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.