November 11, 2025
Cache me outside
Cache-Friendly, Low-Memory Lanczos Algorithm in Rust
Rust dev shrinks big-math memory load; commenters spar over Rust, cheer thesis vibes
TLDR: New two-pass trick slashes memory for giant matrix math and may even run faster thanks to cache-friendly design. Comments split between praise for clear writing, skepticism about Rust in numerics, calls for cache metrics, and thesis-defense cheerleading—big deal for anyone with huge data.
A Rust dev just dropped a brainy hack: a “two-pass” trick that slices memory needs for massive math problems from brutal gigabytes to a lean, size-of-the-problem footprint—and it might even run faster thanks to CPU cache magic. Translation: re-walk the path twice, skip the giant backpack, maybe beat the clock. The community went full popcorn mode. One camp is all heart-eyes for the write-up’s clarity—“even non-math folks can follow,” cheered chrisweekly—while another side fired up the old flame war: Rust in numerics? sfpotter politely threw down, saying “not personally convinced,” and the thread immediately started sharpening its knives. Meanwhile, the performance nerds demanded receipts: manbash asked for proof on cache hit/miss rates like it’s CSI: CPU. Others turned the comments into a thesis pep rally—gigatexal called it practice for defense—and vatsachak dropped a wholesome “hope you get your pay day.” The project’s on GitHub: two-pass-lanczos with a full report, but the real show is in the reactions. Between cheering, skepticism, and requests for hard numbers, this post became a vibes test for Rust in heavy math: speed, memory, and drama in one tidy package.
Key Points
- •Standard Lanczos requires O(nk) memory to store the growing basis; a 500,000-variable, 1,000-iteration case needs ~4 GB for the basis alone.
- •A two-pass Lanczos variant reduces memory to O(n) by recomputing vectors, doubling matrix-vector products but potentially improving speed due to cache effects.
- •Matrix functions are approximated via low-degree polynomials within a Krylov subspace, avoiding explicit formation of dense f(A).
- •The Arnoldi process builds an orthonormal basis and a small projected matrix Hk; computation shifts to f(Hk) in the reduced k-dimensional space using direct methods.
- •For f(z)=z−1 (linear systems), the reduced problem is Hk y = e1 ||b||2 solved via LU; for Hermitian A, Arnoldi simplifies to Lanczos.