LLMs Encode How Difficult Problems Are

AIs secretly track what’s hard—so why fail the easy stuff

TLDR: Researchers found you can read a model’s sense of difficulty and nudging it toward “easy” cuts make‑believe answers; training sharpens the human-labeled signal. Commenters split between “it’s just autocomplete,” funny overconfidence stories, and links to certainty/complexity—why this matters: fewer AI hallucinations for safer tools.

Scientists say chatbots quietly “know” what’s easy vs hard—and nudging them toward easy-mode thinking can cut hallucinations. The twist: during training, the human-marked sense of difficulty gets sharper, while the bots’ own self-estimated difficulty gets worse. The paper basically claims a hidden difficulty meter you can steer for fewer dumb answers.

The comments? Absolute chaos. One camp shrugs: these aren’t “intelligences” at all—just “text completion driven by compressed training data,” says one blunt skeptic, turning the debate into a vibe check on what LLMs really are. Others bring receipts: a developer jokes Claude declares a task “10-week, very complex,” then one-shots it in two minutes. Users riffed on an AI “difficulty slider” and begged for a universal “Easy Mode” toggle to stop models from grandstanding. A few went galaxy-brain, dropping Kolmogorov complexity references, while another linked to research on whether models encode their own certainty here.

Drama summary: believers say this is a real step toward safer, less-fibbing AI; skeptics say it’s lipstick on autocomplete. Memes crowned the day: “AI gaslights itself, film at 11.” If there’s a slider that makes bots hallucinate less, the crowd wants it yesterday.

Key Points

•Human-labeled problem difficulty is strongly linearly decodable from LLM activations and scales with model size (AMC correlation ≈ 0.88).
•LLM-derived difficulty is weaker to decode and exhibits poor scaling compared to human-labeled difficulty.
•Steering model activations toward an “easier” difficulty direction reduces hallucinations and improves accuracy.
•During GRPO post-training on Qwen2.5-Math-1.5B, the human-difficulty probe strengthens and correlates positively with test accuracy.
•The LLM-derived difficulty probe degrades during RL and negatively correlates with performance; code and scripts are released for replication.

Hottest takes

"mentally replace "LLM" with "text completion driven by compressed training data"" — kazinator

""10 week task, very complex", and then one-shot it in 2 minutes." — WhyOhWhyQ

"encodes its own certainty of answering correctly" — jiito

November 6, 2025

Press F to toggle Easy Mode

AIs secretly track what’s hard—so why fail the easy stuff

Key Points

Hottest takes

November 6, 2025

Press F to toggle Easy Mode

LLMs Encode How Difficult Problems Are

AIs secretly track what’s hard—so why fail the easy stuff

Key Points

Hottest takes

Save News