March 22, 2026
When code counts calories
Linking Smaller Haskell Binaries
Haskell tries a crash diet: fans cheer, skeptics say it’s still chunky
TLDR: New build settings trim a Haskell app from 113MB to 64MB, with one risky step that merges duplicate code. Commenters applauded the shrink but said it’s still hefty—one saw 199MB—and blamed the language’s runtime, sparking a safety-vs-size debate and calls for better defaults.
Haskell apps are infamous for being chonky, and today’s headline diet plan has the timeline buzzing. A new post shows that with a few special build settings, the popular Pandoc tool went from a 113MB binary to 83MB by trimming unused code, and then down to 64MB using an extra, experimental step called identical code folding (ICF), which merges duplicate bits of code. Translation for non-nerds: clever settings tell the builder to throw away leftovers and combine lookalike pieces. The catch? ICF can be unsafe in weird edge cases. Even the post warns it’s experimental—think “results may vary.”
Cue the comments section drama. One user deadpanned that their system’s Pandoc is 199MB—“not bad”—which became the meme of the thread: yes, it’s slimmer, but it’s still a big lunch. Another voice chimed in: Haskell’s size problem is a long-running joke, and the runtime (the engine under the hood) is the real weight. Fans celebrated real, measurable progress; skeptics asked if risking stability is worth it for another -23%. The chaos escalated into “make it default!” versus “don’t brick my build,” while others name-dropped LLD like a brand-new gym membership. The vibe: Haskell summer body incoming, but the bulk is not gone yet.
Key Points
- •Enabling GHC -split-sections and linker --gc-sections (via lld) reduced a stripped test-pandoc binary from 113 MB to 83 MB.
- •Adding gcc -fdata-sections and -ffunction-sections helps C code participate in section-level garbage collection.
- •Using lld’s identical code folding (ICF) with --icf=all and ignoring address-equality further reduced the binary to 64 MB.
- •ICF is experimental and may be unsafe, especially if C code depends on function or data address equality.
- •lld logs and DWARF/objdump analysis showed repeated identical sections across pandoc modules, likely from inlining/specialization.