March 10, 2026
Open weights, open wars
Open Weights Isn't Open Training
Open weights isn't open training: devs yell 'black box', vets roll eyes, lawyers say nope
TLDR: An engineer says sharing model files isn’t enough—training still breaks without code, data, and instructions. Commenters split: some call “open weights” a black box in disguise, others say true openness is impossible at this scale, while veterans insist this is normal and today’s tools are already the easiest ever.
A frustrated engineer tried to fine‑tune a giant 1‑trillion‑parameter model and discovered the hard truth: “open weights” doesn’t mean you can actually train the thing. The post’s gist is simple enough for non‑devs: the model files are public, but the training code, data, and step‑by‑step recipes aren’t always there. Even with Hugging Face, PyTorch, or tools like LLaMA‑Factory, the author hit bugs and weird slowdowns, then had to write custom code just to make a Yoda‑talk dataset actually change the model’s behavior. Cue comment section chaos.
The spiciest camp calls “open weights” a ruse, comparing it to shipping a black‑box app: you can run it, but don’t expect to rebuild it. Another camp goes full “reality check,” saying true open training will never happen at scale because the data is a legal and ethical minefield (think copyrighted books, spam, even worse). Meanwhile, hardened veterans clap back: this is normal engineering—before HF, it was way worse, so quit whining. And then there’s comic relief: “Isn’t LoRA solved by Unsloth?” plus Star Wars quips—“May the source be with you.”
Net result: one post about finicky tools turned into a culture war over what “open” even means—and whether it ever will be more than a vibe.
Key Points
- •Author attempted to post-train and serve the open-weights Kimi-K2-Thinking model using open-source tooling and ultimately built a custom training codebase after encountering issues.
- •LLaMA-Factory with KTransformers, though advertised to support the model, presented multiple bugs and an inefficient CPU-offloading-plus-GPU-training design for this use case.
- •Hugging Face hosts Kimi-K2-Thinking with configs and a modeling class; Transformers supports related architectures (e.g., DeepSeek-V3), suggesting possible but nontrivial training support.
- •A synthetic evaluation dataset was created using TriviaQA questions and LLM-generated Yoda-style answers, with success defined by loss reduction and behavior change.
- •Kimi-K2-Thinking is a 1T-parameter MoE model with multi-headed latent attention and 4-bit quantized experts (~594 GB total), motivating an 8×H200 GPU setup (~1120 GB) and a planned LoRA fine-tuning approach.