Ask HN: Scheduling stateful nodes when MMAP makes memory accounting a lie

Server meltdown sparks 'just use Kubernetes' vs 'let nodes say no' brawl

TLDR: A coordinator misread “low rows” as low load and kept hammering a nearly full server, causing a loop. Commenters split between Kubernetes-style reservations, latency-based backpressure, OS signals like PSI, and cheeky calls for ML to babysit metrics—because naive measurements can crash real systems.

A spicy Ask HN confessional lit up the crowd: a coordinator kept shoving data onto a “quiet” server because it had fewer rows, but the box was actually stuffed to the brim with chunky data and near out-of-memory. Cue chaos: the coordinator ignored the “I’m full” signals and basically DDOS’d its own node. The community’s verdict? Row count is a lie, memory is a drama queen, and your scheduler needs therapy.

Team Kubernetes swaggered in first: declare memory reservations so the scheduler treats your capacity like hard facts, regardless of lazy-loaded trickery. The performance purists snapped back: let latency be the truth—if a node gets slow or jittery, feed it less and close the loop to the balancer. The OS whisperers pulled out Pressure Stall Information (PSI), a Linux stat that shows when CPU and memory are starved, with a “look at active pages” wink. Then the chaos agents piled on with the hottest take: “NP-hard? Perfect for machine learning,” turning the “God Equation” meme into “let AI parent your cluster.”

Meanwhile, pragmatists cheered a “dumb coordinator, smart nodes” plan: fire by disk space, let workers 429 (Too Many Requests) when stressed, and separate disk balancing from memory-heavy query work. Peak meme: “mmap is gaslighting your RAM.” Checkmate.

Key Points

•A distributed stateful engine uses a Coordinator to assign data segments to Worker Nodes, with heavy reliance on mmap and lazy loading.
•A failure occurred when the Coordinator misread Node A’s low logical row count as underutilization and repeatedly tried to load new segments.
•Node A was near OOM (~197GB RAM) due to very wide rows and large blobs, making row count a poor proxy for resource usage.
•OS page cache and lazy loading made application-level RSS and disk metrics unreliable for memory-aware scheduling.
•The author proposes options: rely on node-enforced backpressure (HTTP 429), build per-segment cost models, or decouple storage balancing from query/memory balancing, and seeks references.

Hottest takes

"Ok, so you are dealing with a classic - you measure A, but what matters is B." — majke

"Latency backpressure is a pretty conventional thing to do." — bcoates

"That's perfect for machine learning." — wmf

November 24, 2025

When memory gaslights you

Server meltdown sparks 'just use Kubernetes' vs 'let nodes say no' brawl

Key Points

Hottest takes

November 24, 2025

When memory gaslights you

Ask HN: Scheduling stateful nodes when MMAP makes memory accounting a lie

Server meltdown sparks 'just use Kubernetes' vs 'let nodes say no' brawl

Key Points

Hottest takes

Save News