December 10, 2025
Bananas, resistors, and robot vibes
Qwen3-Omni-Flash-2025-12-01:a next-generation native multimodal large model
Qwen3-Omni-Flash promises real-time AI chat; crowd debates vibes vs facts
TLDR: Qwen3-Omni-Flash upgrades real-time, multimodal chat with smoother speech and stronger visuals. Commenters cheer the live voice but roast accuracy and “robot” vibes, compare it to GPT‑4o, and demand proof it works locally—less benchmark flexing, more real-world receipts.
Qwen3-Omni-Flash rolls in claiming it can hear you, see you, and talk back—live—with smoother voices, smarter visuals, and tighter control over its personality. The benchmark charts glow, but the comment section is where the real fireworks happen. One user, dvh, dunked on the model’s “smarts” after asking a simple guitar-pedal question: the bot rattled off 29 resistors when the correct answer was 2—complete with a receipt link. Cue the accuracy vs hype debate: if it can’t count resistors, can it really “see” and “reason” in videos?
Meanwhile, the vibes war is on. binsquare says the voice “sounds normal” but has that unmistakable robot sheen—too steady, too clean. The fruit-pricing demo became a mini-meme: “banana prices in corporate monotone.” Others loved the new persona controls (“sweet, cool, anime”), joking they’ll set it to “customer service mode” for their in-laws.
sosodev sparked confusion, then relief: initially unsure if real-time voice chat worked “like GPT-4o,” they later confirmed, “It does support real-time conversation!” Now the hardware crowd is grilling: who’s actually got this running locally without a cloud crutch? And in the peanut gallery, rarisma just points at the chart and mutters, “GPT-4o is wild,” setting off comparisons and league-table drama. Bottom line: Qwen3-Omni-Flash promises a talkative, multilingual, video-savvy future—but the crowd wants fewer charts, more receipts, and voices that feel alive.
Key Points
- •Qwen3-Omni-Flash-2025-12-01 upgrades the native multimodal Qwen3-Omni with real-time streaming text and speech outputs.
- •Audio-visual interaction, multi-turn stability, and system prompt control (persona, tone, length) are significantly improved.
- •Multilingual support now covers 119 languages for text, 19 for ASR, and 10 for speech synthesis, with prior instability fixed.
- •Benchmarks show gains across modalities: text (ZebraLogic, WritingBench), code (LiveCodeBench-v6, MultiPL-E), speech (Fleurs-zh, VoiceBench), vision (MMMU, MMMU-Pro, MathVision_full), and video (MLVU).
- •Future plans include multi-speaker ASR, video OCR, audio–video proactive learning, and enhanced agent workflows and function calling.