January 16, 2026
Chaos vs Checkboxes
How to wrangle non-deterministic AI outputs into conventional software? (2025)
Can we tame AI’s mood swings? Devs try, commenters roast
TLDR: Eric Evans shows AI can categorize code but its labels change run-to-run, making it tough to use in normal software. Comments clash over whether to force repeatable answers with seeds or structured tools, while others warn AI “confidence” is fake and fluctuating tests break trust—reliability is the real prize.
Eric Evans tries to teach artificial intelligence to color inside the lines—asking a chatbot to name the “domain” of a code snippet and spit it back in neat JSON boxes. It works… until the model changes its mind and uses new labels every run. Cue community chaos. The loudest voice: “Stop asking chatbots for confidence scores—those numbers are vibes, not science,” warns ramity, calling out a common rookie mistake. Meanwhile, an0malous turns the thread spicy by insisting the tech is actually deterministic and we should just set a “seed” (a way to make the model repeat the same answer), asking why that option isn’t standard. Tool fans pile in with ironbound’s fix-it energy, dropping links to structured-output helpers like Outlines, Instructor, and Guardrails like they’re duct tape for AI. Then galaxyLogic throws a grenade: if AI writes your unit tests, and they’re different every time, are any of them “correct”? Memes fly—“LLM confidence = horoscope,” “AI needs a mood ring,” and “seed is the cheat code.” Translation for non-nerds: people want the robot to stop improvising and stick to the script, but the robot keeps jazz-handing.
Key Points
- •LLM outputs are non-deterministic, complicating integration with deterministic software systems.
- •A repository-scanning use case is presented to list domains addressed in code and navigate to high-domain-content areas.
- •An OpenEMR Patient class code sample is used to query an LLM for domain identification.
- •Structuring LLM responses in JSON improves integrability but does not guarantee consistent categories across runs.
- •Repeated queries produce varying domain labels, hindering comparisons and hierarchical roll-ups; the topic leads into domain modeling and strategic design.