April 22, 2026
RAM Wars: Bring the popcorn
Context Is Software, Weights Are Hardware
Longer memory or a brain upgrade? Commenters go to war
TLDR: The post claims feeding longer prompts isn’t the same as teaching an AI, arguing real learning needs weight changes, not just more context. Commenters split between “infinite context solves it” and “hardware (weights) must evolve,” with one provocateur asking if a random model could act smart with just enough examples—stakes are huge for AI’s future.
The post argues that stuffing more text into an AI’s “memory” (the chat box context) isn’t the same as changing its built‑in “wiring” (the learned parameters), and the community pounced. Team Infinite Context showed up with “just make the window bigger” energy, while Weight‑Changers yelled back: longer prompts are like longer instructions—helpful, but not a substitute for new abilities. One side cheered the research claim that temporary context can mimic a training step; the other shot back that “pretend training” isn’t real adaptation.
Author maxaravind lit the fuse by saying people think infinite memory solves learning, then pushed the spiciest idea: context is software, weights are hardware. Cue memes: “PDF dumping ≠ brain surgery,” “RAM vs ROM,” and “Can you turn a potato into Einstein with a long enough prompt?” When qsera asked if a totally untrained AI could act smart with just a giant context, the thread went feral. Some said it’s like giving a calculator a whole textbook—still a calculator. Others argued that with the right examples, even a bare‑bones model can look smart for a bit.
Amid the chaos, newcomers got a crash course: context = the text you feed it now; weights = what it’s already learned. If this holds, the next AI breakthrough won’t just be bigger context windows—it’ll be models that actually change themselves.
Key Points
- •Both KV cache context and weight updates modulate transformer activations; the key difference is temporary vs. permanent effects.
- •Current trend emphasizes longer context windows (e.g., 1M tokens), KV cache compression, and linear attention to enable cheap long-context computation.
- •Fine-tuning permanently shifts activation distributions, while in-context learning temporarily steers them via cached key-value pairs.
- •Research (von Oswald et al., 2023) shows in linear self-attention, in-context learning can be mathematically equivalent to one gradient descent step; Mahankali et al. show optimality for one-layer linear transformers.
- •Weights are likened to hardware (defining capabilities), and context to software (programs), implying weight changes add capabilities beyond what longer context alone can provide.