The Continual Learning Problem

AI that learns like an intern—but doesn’t forget; commenters want less handcrafting and more code

TLDR: Jessy Lin’s “memory layers” promise far less forgetting than current tuning methods when teaching models new facts. Commenters split: some say stop handcrafting and make anti-forgetting part of training, others just want usable libraries, while a few cheer the move beyond quick prompting tricks.

Jessy Lin pitched a fix for the classic “AI forgets what it just learned” problem: memory layers that act like a big brain with only a few cells firing at once. The claim: when teaching new trivia, regular tuning forgets almost everything, LoRA (a lightweight tuning trick) forgets a lot, but memory layers forget way less. Cue the crowd. The strongest take came from optimalsolver, who roasted the approach as “handcrafting like it’s 1993,” insisting the solution should be baked into the training objective—basically, let the algorithm figure out how not to forget. Meanwhile, esafak kept it practical with a vibe of “cool story, where’s the library?” And skeptrune cheered the move beyond surface tricks like RAG (letting the AI look stuff up) and few-shot prompting (showing a handful of examples). The drama? A split between “design clever memory gadgets” versus “just optimize forgetting away.” The jokes? People calling this “Intern Mode”—teach the AI a new fact without wiping its brain—plus memes about AI needing “memory foam.” Whether you’re team architecture or team objective, the community wants one thing: a smarter model that keeps learning without turning into a goldfish. Read the paper: Continual Learning via Sparse Memory Finetuning

Key Points

  • The article motivates memory layers—sparse, high-capacity components—as an architecture for continual learning.
  • Sparse memory finetuning reduces forgetting compared to LoRA and full finetuning when learning TriviaQA facts.
  • On NaturalQuestions, performance drops were 89% (full finetuning), 71% (LoRA), and 11% (memory layers).
  • Continual learning is framed as two subproblems: generalization and forgetting/integration.
  • Paraphrasing and broader augmentations (Active Reading) help disambiguate learning signals and improve retention; naive next-token prediction is insufficient.

Hottest takes

“handcrafting solutions like it’s 1993” — optimalsolver
“any libraries that implement this?” — esafak
“going beyond RAG and few shot prompting” — skeptrune
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.