January 13, 2026

Java went from dial-up to fiber

A 40-Line Fix Eliminated a 400x Performance Gap

Tiny code change makes Java scream; commenters lose it over a 400x speedup

TLDR: A small code change replaced slow file parsing with a direct time call, making Java’s thread timing 30–400x faster. Commenters split between celebrating raw speed, debating measurement accuracy, and flexing even wilder 8ns hacks—important because faster, smoother apps under load mean fewer stalls and happier users.

A 40‑line tweak just turned Java’s thread time check from a clunky file-reading ritual into a single “tell me the time” call—and the crowd went wild. The author, jerrinot, dropped the bomb: measuring “How much CPU did this thread use?” was way pricier than anyone expected. Cue cheers like ee99ee’s “great writeup,” and then the speed nerds arrived with receipts. higherhalf schooled everyone on vDSO—basically a fast‑lane trick that keeps you out of the kernel—saying it shows up on flamegraphs (those colorful charts of time spent) as pure win. Then ot swaggered in with an even hotter take: skip the syscall entirely and use perf events—special counters exposed via a shared page—to hit around 8 nanoseconds with a single CPU tick read. Translation: “We can go faster, baby.”

Drama? Oh yes. Some asked why this took since 2018 to fix, poking at standards politics: POSIX (the rules) wants total time, not just user time, so the old code parsed Linux’s /proc files. Now the Linux‑specific path says “hold my beer.” Purists worried about measurement semantics, speed freaks yelled “ship it.” Meanwhile, goodroot turned it into a QuestDB love-fest, and memes flew: “10x per line ROI,” “Java speedrun any%,” and “sscanf is the real villain.” Links: OpenJDK, clock_gettime, QuestDB

Key Points

  • OpenJDK replaced Linux /proc-based parsing for thread user CPU time with a clock_gettime-based approach.
  • The prior implementation parsed /proc/self/task/<tid>/stat using file I/O and sscanf, converting clock ticks to nanoseconds.
  • A 2018 bug report measured getCurrentThreadUserTime as 30x–400x slower than getCurrentThreadCpuTime, worsened by concurrency.
  • clock_gettime(CLOCK_THREAD_CPUTIME_ID) uses a direct kernel call chain (posix_cpu_clock_get → cpu_clock_sample → task_sched_runtime), reducing overhead.
  • The original design used /proc due to POSIX lacking a portable user-only thread CPU time API; Linux-specific features enable a faster method.

Hottest takes

“much more expensive than it should be” — jerrinot
“vDSO, avoiding a context switch” — higherhalf
“about 8ns… by using software perf events” — ot
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.