First per-image PCA decomposition of Kodak suite reveals deliberate curation

Kodak test pics decoded: blue goes rogue, title cops pounce, everyone asks so what

TLDR: A new breakdown of Kodak’s test photos shows color behavior varies widely—especially blue—suggesting the set was carefully chosen and giving researchers a benchmark for compression. Commenters split between “explain it simply,” “why it matters,” and title purists, highlighting a tug-of-war between mathy rigor and practical payoff.

A researcher just ran a deep color “x-ray” on the 24 famous Kodak test photos, breaking each image into its core color ingredients to see how much the channels (red, green, blue) overlap. Translation: it’s a lab test for how pictures store color. The surprise? The set looks intentionally picked to cover a wide range—some images are easy to untangle, others are chaos—and the blue channel swings wildly from barely independent (2.3%) to doing its own thing (52%). It’s a tidy, per-image map that sets a best-case baseline for future image compression and quality tests.

But the real show is the comments. One camp is blunt: “so what?” They want a plain-English payoff and why this matters in the real world. Another is waving a white flag—“explain it like I’m five”—begging for a simple explainer. A helpful soul drops the actual photo set here. Then the rules squad storms in: the thread gets scolded for not using the original title, citing HN guidelines. Meanwhile, jokesters latch onto “blue independence,” quipping that blue is freelancing while red and green are stuck in meetings. Beneath the memes sits a real debate: is this essential groundwork for better codecs and fairer benchmarks, or just gorgeous math with a missing punchline?

Key Points

  • Provides first per-image PCA decomposition for all 24 Kodak Lossless True Color images.
  • Positions results as KLT-based baselines for optimal linear decorrelation.
  • Computes per-image covariance matrices, eigenvalue spectra, eigenvector loadings, condition numbers, and blue channel independence.
  • Finds systematic variability across the suite, suggesting deliberate curation, with condition numbers from 7.55–1739.16 and blue independence from 2.3%–52.0%.
  • Proposes uses in benchmarking image compression, assessing data redistribution, and analyzing PSNR/SSIM correlations with eigenvectors.

Hottest takes

"I'd love some sort of 'so what?' explainer" — aaronbrethorst
"Can someone explain what this is like I'm the idiot I am?" — hypercube33
"Use the original title, unless it is misleading or linkbait" — thunderbong
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.