Nested Learning: A new ML paradigm for continual learning

AI that stops forgetting? Fans say 'finally', skeptics say 'rebrand'

TLDR: Researchers pitch “Nested Learning” to help AI learn new things without forgetting old ones, debuting a model called Hope. The crowd splits: open-source devs race to reproduce it while veterans call the idea obvious and demand proof, making this a must-watch if you want AI that keeps its memories.

Forgetful AI, meet Nested Learning—a new idea that treats one big model as lots of small learning loops working together. The pitch: beat catastrophic forgetting (when an AI learns a new trick and forgets the old) by making the training rules and the model’s design part of the same system. The team even built a proof-of-concept called “Hope” that claims better long-term memory than today’s large language models (LLMs), the chatbots behind tools like ChatGPT. Published at NeurIPS 2025, a top AI conference, the premise is simple: make learning deeper and more coordinated so the model remembers what it learns.

And the crowd? Spicy. One camp is buzzing over open-source action—abracos flagged a public reproduction repo, shouting “show us the code!” Another camp shrugged: panarchy said they’ve been waiting since 2019, calling the idea self-evident and urging real results on mixed architectures and task-specific meta-networks. Meanwhile, memes popped up: matryoshka dolls for “nested” brains, and “Hope or Hype?” puns everywhere. Skeptics wonder if this is brilliant or just clever branding for better training. Builders are already forking the repo. Either way, the vibe is clear: if “Hope” really remembers, expect fireworks—and if it doesn’t, expect pitchforks.

Key Points

  • Nested Learning reframes models as nested, multi-level optimization problems with distinct context flows and update rates.
  • The paradigm unifies model architecture and optimization rules, treating them as levels of the same concept.
  • A self-modifying architecture, “Hope,” is presented as a proof-of-concept validating Nested Learning.
  • Hope reportedly achieves superior language modeling performance and better long-context memory management versus state-of-the-art models.
  • The paper claims backpropagation can be viewed as an associative memory that maps data points to local errors, aiding continual learning and reducing catastrophic forgetting.

Hottest takes

"Someone's trying to reproduce it in open" — abracos
"I've been waiting for someone to make this since about 2019" — panarchy
"mixed heterogeneous architecture networks with a meta network" — panarchy
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.