March 17, 2026
Dialects, drama, and data: compile this tea
Enabling Efficient Sparse Computations Using Linear Algebra Aware Compilers
Super-compiler promises speed; crowd asks: SciPy, Julia, or Mojo déjà vu
TLDR: LAPIS is a new compiler framework trying to make big math run faster on any hardware, from GPUs to clusters. The crowd is split: some see “SciPy/Julia/Mojo déjà vu,” others worry about accuracy (hello, Kahan Summation), and a few cheer the promise of one tool to rule many machines.
A new research project called LAPIS wants to be the universal speed-boost button for heavy math, making sparse (aka mostly-zero) computations fly on everything from GPUs to big clusters. It uses MLIR (a fancy code middleman), a “Kokkos dialect” to translate to many chips, and a “partition dialect” to spread data across machines with less chatter. The pitch: faster science code, portable everywhere, from database tricks to complex graph matching.
But the comments? Pure spice. One camp squints and says, “Isn’t this just SciPy but with MLIR?” They’re not wrong to ask—Python’s SciPy, MATLAB, and Julia already power tons of math. Another thread wonders if this overlaps with Mojo, the buzzy new language aiming to blend Python vibes with C-speed. Translation drama alert: are we getting progress, or just yet another dialect to learn?
Then the precision police roll in, invoking Kahan Summation—a classic trick to avoid small math errors piling up. Their worry: will all this compiler magic keep results accurate, or just make wrong answers arrive faster? Meanwhile, jokesters pull out “Dialect Bingo” cards and call the Kokkos translator a “universal remote for GPUs.” Love it or side-eye it, LAPIS has everyone arguing about speed, portability, and whether the future is a better compiler—or a better language.
Key Points
- •LAPIS is an MLIR-based compiler framework that optimizes sparse linear algebra and aims for performance portability across architectures.
- •Its Kokkos dialect lowers high-productivity code and converts MLIR to C++ Kokkos, enabling integration of SciML models into applications.
- •A partition dialect extends LAPIS to distributed memory by managing sparse tensor distribution, expressing communication, and minimizing communication during distributed execution.
- •The project shows MLIR can drive linear algebra–level optimizations that improve performance on various GPUs for both sparse and dense kernels.
- •Applications include sparse linear algebra and graph kernels, TenSQL (built on GraphBLAS), and subgraph isomorphism/monomorphism; the article also outlines the four-index integral transform used in NWChem as sequential tensor contractions.