March 31, 2026

Cores on fire, comments hotter

Show HN: Forkrun – NUMA-aware shell parallelizer (50×–400× faster than parallel)

New “forkrun” claims 400× speed-ups — devs cheer, Go loyalists say “just write it in Go”

TLDR: New shell tool “forkrun” claims massive speed-ups by keeping work close to each CPU and cutting coordination overhead; it aims to supercharge lots of tiny tasks. Commenters split between excited speed-chasers and skeptics saying heavy jobs won’t notice—and some argue you should just write it in Go instead.

Hacker News lit up after a dev dropped forkrun, a “turbo switch” for your terminal promising up to 50×–400× faster batch jobs than GNU Parallel. The creator says it keeps work close to each CPU and avoids the usual coordination chatter, so your many cores finally do more than watch one core sweat. It ships as a single Bash script that unpacks a tiny C helper, with “verifiable builds” via GitHub — a neat nod to the security crowd. Source and demo live at github.com/jkool702/forkrun.

Then came the split-screen reactions. Speed fans cheered the benchmarks and the “drop-in” promise — no Perl, no Python, no drama, just go faster. The dev’s rallying cry — why is one core melting while the rest nap? — hit a nerve. But the pragmatists rolled in: one top comment basically said, “I just write a small program in Go and let the scheduler handle it,” sparking a mini language war. Another user shrugged that their jobs (think long video encodes) are so heavy that dispatch speed doesn’t matter; their core complaint: cool tech, unclear payoff for real-world monsters like ffmpeg.

And yes, the jokes. Someone teased if this was “vibe-coded,” and others riffed on the tool’s “born-local” tagline like it was a craft coffee. The vibe: blazing demos vs. practical skeptics. If your workload is lots of tiny tasks, this looks like rocket fuel. If it’s big chonky jobs, some say you won’t feel the lift.

Key Points

  • forkrun is a NUMA-aware, contention-free shell parallelizer designed as a drop-in replacement for GNU Parallel and xargs -P.
  • It reports 50×–400× speedups, 200,000+ dispatches/sec, ~95–99% CPU utilization, and near-zero cross-socket memory traffic on modern multi-socket CPUs.
  • Distribution is a single bash file with an embedded C extension; users source it to load builtins and use the frun command for parallelization.
  • Benchmarks on a 14-core/28-thread i9-7940X over 100M lines show up to ~1.54B lines/s (-b mode) and substantial gains across multiple modes versus GNU Parallel.
  • Architecture features include born-local NUMA ingest, SIMD-based per-node indexing, atomic batch claiming, background memory reclamation, and automatic PID-based tuning; builds are verifiable via GitHub Actions.

Hottest takes

"Have you ever run GNU Parallel... one core pegged at 100% while the rest sit mostly idle?" — jkool702
"Generally when I want to run something with so much parallelism I just write a small Go program instead" — nasretdinov
"I’ve never really used parallel for anything that was bound by the dispatch speed" — tombert
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.