May 9, 2026
Context window? More like context mansion
The context window has been shattered: Subquadratic debuts a 12M token window
Startup says its AI can read a library at once — commenters are yelling “show us”
TLDR: Subquadratic says its new AI can handle vastly more text at once than today’s big-name rivals, a claim that could shake up how AI tools work if it holds up. Commenters, though, are deeply skeptical and keep asking the same thing: where’s the proof?
A tiny Miami startup just stomped into the AI arms race claiming it has done the impossible: built a system that can handle 12 million tokens of text at once — basically, a mind-bending amount of words in one go — with a 50 million version supposedly coming soon. Subquadratic says its new design is faster, cheaper, and even beats some of the biggest names on key tests. On paper, it sounds like the kind of announcement that should make the whole industry spill its coffee.
But the real show is in the comments, where the crowd is serving a full buffet of hype, suspicion, and eye-rolls. The loudest reaction is pure skepticism: “I believe it, when I see it,” one user shrugged, setting the tone for a thread that reads like a group chat after someone announces they’ve built a perpetual motion machine. Others immediately asked the question that always starts the internet drama: where’s the paper? With no full technical report out yet, commenters were quick to side-eye the launch and wonder whether this is a real breakthrough or a polished investor pitch.
That kicked off the spiciest mini-feud: some argued the company may be hiding details because the trick isn’t actually that revolutionary, while others said everyday users may not even need this much memory in the first place. One commenter basically said, for coding help, 1 million is already enough. Translation: Subquadratic may have dropped a giant number, but the community is still debating whether this is history in the making — or just big-context theater.
Key Points
- •Subquadratic, a Miami-based startup, launched its first model and claims it supports a 12 million-token context window, with plans for a 50 million-token version.
- •The article says standard transformer attention scales quadratically with context length, which has limited major frontier models to about 1 million tokens and driven workarounds such as RAG and agentic decomposition.
- •Subquadratic says its Subquadratic Selective Attention architecture scales linearly in compute and memory with context length and runs 52 times faster than dense attention at 1 million tokens.
- •The company reports benchmark results of 92.1% on needle-in-a-haystack retrieval at 12 million tokens, 83 on MRCR v2, and 82.4% on SWE-bench.
- •Subquadratic is offering the model through an API and alongside two tools: SubQ Code for coding and SubQ Search for deep research.