March 4, 2026
Fine-tune or fine-whine?
Qwen3.5 Fine-Tuning Guide – Unsloth Documentation
Unsloth drops a faster way to “teach” Qwen, but the crowd is split and the corporate drama is messy
TLDR: Unsloth’s guide makes training Qwen3.5 faster and lighter, with options for text and vision. The comments explode over whether we should fine‑tune at all versus better prompting and document retrieval, while some cheer real edge wins and others worry Qwen’s leadership is drifting from open source.
Unsloth just dropped an easy guide to fine‑tune Qwen3.5 — think: teaching the model new tricks with your own data — and the crowd went loud. The headline promises are spicy: up to 1.5× faster training with about 50% less memory, works for text and images, and the big boy (35B) can be trained with a “clip‑on” adapter called LoRA on 74GB of GPU. Caveats sparked memes: compile times drag on older cards because of custom kernels, and 4‑bit training (QLoRA) is a no‑go on Qwen3.5. You can even choose which parts to tune (vision, language, attention, MLP), then export to formats that play nice elsewhere and push to Hugging Face. If things go weird in other apps, Unsloth says it’s usually a mismatched chat format.
And then came the drama. One camp shouted: stop fine‑tuning everything — just “prompt better” and use retrieval (feeding documents at question time) since Qwen’s huge memory window can swallow context. Another camp asked for real‑world wins, and they got them: edge deployments on NVIDIA Jetson, offline retail analytics, industrial inspection — all cheering LoRA for keeping models lean. The spice level rose when a commenter complained about Qwen leadership going “more business,” fearing the end of the open‑source vibe. Jokes flew about “T4 compile time being the new loading screen” and “LoRA isn’t a yoga class.” Verdict: tech win, community cage match.
Key Points
- •Unsloth supports fine-tuning the Qwen3.5 family (0.8B–122B, including MoE) for both text and vision.
- •Unsloth reports 1.5× faster training and ~50% less VRAM use versus FA2 setups; Qwen3.5-35B-A3B bf16 LoRA fits in 74GB VRAM.
- •QLoRA (4-bit) is not recommended for Qwen3.5 models (MoE and dense) due to quantization differences.
- •Custom Mamba Triton kernels may slow initial training compilation, especially on T4 GPUs.
- •The guide provides code for SFT, selective layer fine-tuning for vision, and exporting to GGUF or merged 16-bit models, with Hub upload options.