May 27, 2026

35 hours, zero sleep, maximum chaos

Qwen3.7-Max Ran for 35 Hours on Unknown Hardware and Achieved a 10× Speedup

AI worked for 35 hours straight, and the comments instantly turned into a roast

TLDR: Alibaba says its AI spent 35 hours teaching itself how to speed up an important software task on unknown hardware and got a 10x improvement. Commenters were torn between being impressed and joking that the real victim is whoever has to maintain the code monster it created.

Alibaba says its new AI model, Qwen3.7-Max, was thrown at a tough speed problem on a totally unfamiliar chip and just… kept going. For 35 straight hours, it rewrote code, tested itself, broke things, fixed them, and somehow ended up making the task run 10 times faster than the starting version. That beat rival models, some of which gave up early when they stopped making progress. In plain English: the bot was locked in, and the internet noticed.

But the real fireworks were in the comments, where people were split between "wow" and "absolutely not". One camp was impressed that an AI could keep poking at a problem for over a thousand tool uses without a human babysitter. Another immediately jumped to the nightmare scenario: sure, it’s fast now, but who on earth is maintaining the mystery spaghetti code after 35 hours of machine improvisation? That skepticism got spicy fast, with one commenter basically saying AI “fixes” things by dumping in more code and leaving humans to deal with the mess later.

Then came the meta-drama: several people weren’t just debating the result, they were roasting the article itself as sounding AI-written. One called it “obligatory” that the post read like machine-generated prose, while another dragged in the author’s X account as evidence. And of course there was the classic escalation joke: if models can optimize code this long, shouldn’t they just start improving themselves next? Cute, funny, and only mildly terrifying

Key Points

  • Alibaba says Qwen3.7-Max optimized the Extend Attention kernel on previously unseen T-Head ZW-M890 PPUs and achieved a 10x speedup over a reference implementation.
  • The run lasted 35 hours, included 1,158 tool calls, and involved 432 kernel evaluation cycles of writing, compiling, profiling, and revising code.
  • The task centered on Extend Attention in SGLang, a production inference component handling attention between new tokens and a prefix KV-cache of up to 32K entries.
  • Alibaba’s reported comparison results put GLM 5.1 at 7.3x, Kimi K2.6 at 5x, and DeepSeek V4 Pro at 3.3x on the same task.
  • The article attributes Qwen3.7-Max’s performance to Alibaba’s environment scaling training approach and notes limitations including proprietary API access and no open weights or self-hosting.

Hottest takes

"At this point the models should just start improving themselves." — mannyv
"I wouldn’t want to maintain whatever it ended up spewing after 35 hrs." — keyle
"Either written by AI or by a human who has spent so much time with AI that they adopted its writing style." — yjftsjthsd-h
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.