Solving the Problems of HBM-on-Logic

Hot AI chips spark a wild debate: slow them down to go faster

TLDR: A new roadmap says stacking memory right on top of AI chips could work—if we cool them down, even by running slower. Readers split between “slower clocks for faster AI throughput” believers and skeptics who say costs, yields, and real-world constraints could fry the plan before it ships.

Who knew the fastest way to speed up AI might be… slowing it down? That’s the chaotic energy after imec’s new roadmap claimed 3D-stacking memory directly on top of GPUs could work if we tame the heat—possibly by cutting clock speeds. In plain English: put memory towers on the chip, but don’t cook it. The More Than Moore post lays out the thermals, the tricks, and the catch: halving frequency cools things the most, and it might still be pricey.

Cue the comments. The pragmatists are loud: one reader argues that for giant chatbots (LLMs), slowing chips could actually boost throughput and save money—because cooler chips run longer, harder. “People will take that trade,” they insist, and they’re not wrong given today’s AI spending spree. On the other side, the realists call it a “roadmap” not a product, warning hyperscalers juggle power limits, floor space, and both speed-per-token and delay-per-token. Translation: the math is messy, and Wall Street hates messy.

Meanwhile, the memes flew: “underclocking is the new overclocking,” and jokes about turning GPUs into luxury panini presses. Some winked at the irony of a former overclocker co-signing slower clocks. Bottom line? The crowd is split between “slow down to win” and “wake me when it’s cheap, cool, and real.” Drama level: blistering.

Key Points

  • imec presented a thermal roadmap for 3D HBM-on-logic (“HBM-on-GPU”) at IEDM 2025, claiming feasibility with major design changes.
  • Thermal constraints are the main bottleneck; imec outlines multiple mitigation steps, with halving frequency giving the largest thermal relief.
  • Baseline simulations model a ~400W compute die with multiple 12-Hi HBM stacks under a liquid-cooled cold plate rated at 30 W/cm² per Kelvin.
  • The 2.5D baseline with HBM4-like base dies showed a peak GPU temperature of 69.1°C and worst-case HBM silicon around 60°C.
  • Commercial concerns remain, including layout complexity, cost, and yield, which may affect viability despite thermal feasibility.

Hottest takes

"cut clock frequency to boost LLM inference performance by 46%... People will certainly take that trade off if offered" — fancyfredbot
"3d HBM is still figuring out what it can be, and what it will look like - seems right" — vessenes
"Hyperscalers are dealing with a pretty complex Pareto envelope" — vessenes
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.