March 24, 2026
Copy‑paste brains, cancel the cloud?
LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?
Copy‑paste a few layers, get a smarter bot — hype, skeptics, and “cloud is over” vibes
TLDR: A simple “repeat these layers” tweak boosts a popular open model and hints at a shared “thinking space” across languages. Comments split between “cloud is dead” hype, requests to port it to llama.cpp, and skeptics demanding reproducibility and real-world tests, raising big questions about cheaper AI.
The AI crowd is buzzing over a wild sequel: the author says the “Repeat Your Self” trick — literally copy‑pasting a small chunk of a model’s middle layers — still works on a newer fan‑favorite model (Qwen3.5‑27B). No retraining, just duplicating a contiguous block. Cue the comments: some cheering, some side‑eye, and one prophet announcing the end of cloud computing.
Author dnhkng dropped the mic with “just repeat layers 31–33,” claiming that after testing thousands of options, simple blocks beat fancy setups. Builders piled in asking if this can go into llama.cpp like, yesterday. Skeptics like lostmsu want receipts: reproducibility, multiple runs, real metrics — not just a lucky spin. Meanwhile, _lex went full doomsday‑for‑AWS with “we’ve discovered the language,” arguing this could make AI “like a calculator.” Drama level: spicy.
Then came the brain‑melter: a claimed universal “thinking space.” Evan Maunder’s experiment compared the same sentence in English, Chinese, and even Base64, and the middle layers allegedly look almost identical — with the author expanding it to same‑topic, different‑language tests. The memes wrote themselves: “Ctrl+C, Ctrl+Smarter,” and “Rosetta Stone in your GPU.” If this holds up, cheaper AI on home GPUs and fewer cloud bills might be on the table — but the thread’s split: revolution now vs show me the benchmarks. For now, grab popcorn and the leaderboard
Key Points
- •RYS (duplicating mid-layer blocks without training) previously elevated Qwen2-72B to #1 on the Hugging Face Open LLM Leaderboard.
- •The author tested whether relayering remains effective on modern models and reports it still helps, based on extensive search and validation.
- •A large-scale process—3,024 beam search candidates, a surrogate model scoring 2 million configurations, and unified validation—underpinned the results.
- •Qwen3.5-27B was selected as a practical and scientifically informative testbed; additional models (e.g., MiniMax M2.5) are planned.
- •Experiments inspired by Evan Maunder show a three-phase encode–reason–decode structure and suggest mid-layers operate in a language-agnostic “thinking space.”