Fixing a kubelet memory leak in Kubernetes 1.36

Kubernetes got caught quietly eating memory, and commenters turned the bug hunt into a hero story

TLDR: A bug in Kubernetes 1.36 made a core system process slowly eat memory, and one small test server caught it before bigger systems might have. Commenters split between praising the bug hunter as a hero and swapping practical survival tips like checking versions and restarting the process until the fix lands.

A tiny, budget server just exposed a very unglamorous villain: the background brain of a Kubernetes cluster was quietly gobbling up memory until apps started getting restarted. The author traced the problem to a small bug in Kubernetes 1.36 and found that the system process called kubelet was the real memory hog, not the apps everyone would normally blame. In classic internet fashion, the community instantly turned a dry debugging story into a mix of applause, caution, and “wow, that could have been nasty.”

The loudest reaction was basically: check your version right now. The author jumped into the comments with a public-service warning for anyone on versions 1.36.0, 1.36.1, or 1.36.2, urging people to look at memory use on each machine and restart kubelet as a temporary fix. That practical, almost parental tone gave the thread a mini emergency vibe. Others were more emotional: one commenter called the discovery “exactly how it should work,” cheering the fact that even huge open-source projects can still be fixed by one determined person with patience and a cheap test box. Another went full superhero mode with “Not all heroes wear capes!”

The closest thing to drama came from developers side-eyeing the broader lesson: if even a tiny lifecycle mistake can make memory leak over and over, maybe some common coding advice is easier said than done. And yes, there was dark ops humor too, with one commenter basically saying the safest plan is to watch kubelet and reboot it when it misbehaves. Nothing says modern computing like babysitting the babysitter.

Key Points

  • A small single-node Kubernetes test cluster upgraded to v1.36 began showing memory pressure and pod restarts on a 2 GiB RAM node.
  • Pod-level checks did not reveal abnormal application memory use; direct node inspection showed the kubelet process itself was growing in memory.
  • Restarting kubelet temporarily relieved the issue but did not remove the underlying leak.
  • The author captured a heap profile from kubelet using Go’s pprof tooling through a Kubernetes debug endpoint.
  • Heap analysis linked the retained allocations to Go context cancellation and timeout objects in kubelet pod worker code paths, indicating a leaked context on every startPodSync.

Hottest takes

"you may want to... consider restarting kubelet as a temporary workaround" — compumike
"this is exactly how it should work" — rirze
"Not all heroes wear capes!" — fsuts
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.