Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation

Cheaper AI chats or blurry brains? Community split on “constant‑cost” attention

TLDR: Researchers say AI “attention” can run at fixed cost per word, promising cheaper, longer chats. Commenters are split between excitement over unbounded context and concerns that the approximation may dull sharp, needle-in-haystack focus and trade accuracy for speed, demanding real benchmarks before celebrating.

AI’s “attention” — the part that decides which words to focus on — just got a wild proposal: make it cost the same per word, no matter how long the conversation. The paper claims a fancy math trick (a symmetry‑aware Taylor approximation) lets models generate unbounded text at a modest, fixed cost. Cue the hype train: rvz cheers that this could finally curb AI’s power hunger, while spacewhales drops receipts with a GitHub repo. yanosh_kunsh asks the question everyone wants answered: will this make chatbots cheaper and let them remember way more? The vibes: cautiously euphoric, with calculators out and cloud bills in their sights.

Then the drama hits. bluecoconut worries this undermines what attention does best — sharp “needle‑in‑haystack” focus — and warns a Taylor shortcut might blur that winner‑takes‑all snap. mapontosevenths throws the brake lever: it’s still an approximation, so what’s the accuracy trade‑off, and will GPUs like this more than good old softmax? People joke about Taylor Swiftmax and “softmax, but make it soft serve,” but the serious split is clear: team cheaper, longer context vs team don’t sand off the sharpness. The authors say it’s close to float16 with four Taylor terms, but the comments want real‑world proofs and benchmarks before declaring victory.

Key Points

•Proposes computing Transformer self-attention at constant cost per token to arbitrary precision.
•Derives the formulation by decomposing the Taylor expansion into symmetric chains of tensor products.
•Uses symmetry to create feed-forward transformations mapping queries and keys to a minimal polynomial-kernel feature basis.
•Cost is fixed inversely proportional to head size, enabling more heads per token.
•Implementation is provided with empirical validation, enabling unbounded token generation at modest fixed cost and reducing infrastructure and energy demands.

Hottest takes

“So does that mean that LLM inference could go down significantly in price and/or context length would dramatically increase?” — yanosh_kunsh

“I almost feel like this goes opposite to what attention is good at” — bluecoconut

“I wonder exactly how much that trade-off costs in terms of accuracy vs performance?” — mapontosevenths

February 4, 2026

Taylor Swiftmax or Taylor miss?

Cheaper AI chats or blurry brains? Community split on “constant‑cost” attention

Key Points

Hottest takes

February 4, 2026

Taylor Swiftmax or Taylor miss?

Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation

Cheaper AI chats or blurry brains? Community split on “constant‑cost” attention

Key Points

Hottest takes

Save News