Show HN: Steerling-8B, a language model that can explain any token it generates

AI with receipts: New model claims it can explain every word it writes — cheers, side‑eye, and memes follow

TLDR: Steerling-8B says it can explain every word it writes and let users steer ideas on the fly. The crowd split between “finally, real explainability” and “cool story, does it fix subtle errors?”, with side debates on SHAP and whether provenance actually builds trust.

Steerling-8B strutted onto the scene promising something wild: an AI that can show its receipts for every word it writes. It claims it can point to the exact parts of your prompt, the human‑readable “concepts” (like tone and topic), and even which training sources (think Wikipedia or research papers) fed each sentence—plus you can dial concepts up or down at runtime. Fans are buzzing that this could finally make explainable AI more than a buzzword; one user called it “the answer” that could unlock off‑limits use cases.

But the skeptics arrived fast. The toughest pushback: knowing a paragraph came from Arxiv doesn’t fix the subtle wrongness many people see in AI answers. One commenter demanded, “what value does this bring?” while another warned we’re “explaining shadows on the wall,” a line that instantly became the thread’s running joke. Practical heads asked why not just use SHAP (a popular explanation tool), sparking a mini‑fight over whether old tools fit new giant models.

Meanwhile, a quieter but sharp observation landed: interpretability barely shows up in everyday AI builder chatter—did everyone assume it was solved, or gave up? Between the hype and the eye‑rolls, the meme of the day was “AI, but with receipts,” and a fantasy slider to turn down “clinical tone” when emailing mom.

Key Points

•Steerling-8B is an 8B-parameter language model that provides token-level attribution to input, concepts, and training data.
•The model’s embeddings are decomposed into known concepts (~33K), discovered concepts (~100K), and a residual pathway.
•Weights and companion code are released, with tooling on GitHub and PyPI for interaction and attribution demos.
•Steerling-8B enables inference-time concept control, training data provenance, and alignment without retraining.
•On validation, over 84% of token contributions come from the concept module; removing residual has small impact on LM Harness tasks, and performance is competitive despite less compute.

Hottest takes

“might be the answer to the explainability issue with LLMs” — rvz

“what value does this bring ?” — great_psy

“you’re still explaining shadows on the wall.” — gormen

February 23, 2026

Show me the receipts

AI with receipts: New model claims it can explain every word it writes — cheers, side‑eye, and memes follow

TLDR: Steerling-8B says it can explain every word it writes and let users steer ideas on the fly. The crowd split between “finally, real explainability” and “cool story, does it fix subtle errors?”, with side debates on SHAP and whether provenance actually builds trust.

Key Points

Hottest takes

February 23, 2026

Show me the receipts

Show HN: Steerling-8B, a language model that can explain any token it generates

AI with receipts: New model claims it can explain every word it writes — cheers, side‑eye, and memes follow

TLDR: Steerling-8B says it can explain every word it writes and let users steer ideas on the fly. The crowd split between “finally, real explainability” and “cool story, does it fix subtle errors?”, with side debates on SHAP and whether provenance actually builds trust.

Key Points

Hottest takes

Save News