April 2, 2026
Benchmarks or Benchwarmers?
Qwen3.6-Plus: Towards Real World Agents
Hyped AI, Side‑Eye Vibes: Closed model, old charts, big promises
TLDR: Qwen3.6‑Plus launches with big claims about better coding and a huge context window, but commenters slam outdated comparisons and closed‑model secrecy. A promise to open‑source smaller versions keeps hope alive, yet the crowd wants transparency and newer benchmarks before crowning a new champ.
Qwen just dropped its new model, Qwen3.6‑Plus, a souped‑up AI it says can write code like a champ, juggle huge documents (a million characters, they brag), and “see” images more clearly. It’s live via API on Alibaba Cloud, pitched as a leap from February’s Qwen3.5 with serious “vibe coding” energy. The press page flexes scores and talks about smarter planning and tool use, aiming for real‑world assistants you can actually trust. That’s the pitch. The comments? Spicy.
The top chorus: stop the cherry‑picking. Multiple users call out Qwen for comparing its numbers to older rival models—specifically Anthropic’s Opus 4.5—while 4.6 has been out for weeks. “How convenient,” one quips, accusing Qwen of using last season’s scoreboard to look shinier. Then comes the second firestorm: it’s not open source. Parameter counts aren’t shared, weights aren’t downloadable; one commenter deadpans, “not open weights, not interested.”
Still, a sliver of hope threads through: Qwen quotes a promise to open‑source smaller versions “in the coming days,” which split the crowd into “let’s see” vs. “I’ll believe it when I git clone it.” Popcorn emojis flew, plus a few April 1 release‑date jokes. Verdict from the bleachers: impressive claims, but the community wants receipts, newer comparisons, and—most of all—open weights before they roll out the welcome mat. Read the post yourself at qwen.ai and test the bot at chat.qwen.ai.
Key Points
- •Qwen announced Qwen3.6-Plus, a hosted LLM available via API and Alibaba Cloud Model Studio.
- •The model features a default 1M-token context window, enhanced agentic coding, and improved multimodal reasoning.
- •Qwen3.6-Plus targets practical engineering tasks, complex terminal operations, and long-horizon planning with stronger tool use.
- •General capability gains include difficult STEM reasoning, ultra-long context information extraction, and multilingual performance.
- •Benchmarks reported: SWE-bench Verified 78.8; SWE-bench Multilingual 73.8; SWE-bench Pro 56.6; Terminal-Bench 2.0 61.6, compared against several frontier models.