How to wrangle non-deterministic AI outputs into conventional software? (2025)

Can we tame AI’s mood swings? Devs try, commenters roast

TLDR: Eric Evans shows AI can categorize code but its labels change run-to-run, making it tough to use in normal software. Comments clash over whether to force repeatable answers with seeds or structured tools, while others warn AI “confidence” is fake and fluctuating tests break trust—reliability is the real prize.

Eric Evans tries to teach artificial intelligence to color inside the lines—asking a chatbot to name the “domain” of a code snippet and spit it back in neat JSON boxes. It works… until the model changes its mind and uses new labels every run. Cue community chaos. The loudest voice: “Stop asking chatbots for confidence scores—those numbers are vibes, not science,” warns ramity, calling out a common rookie mistake. Meanwhile, an0malous turns the thread spicy by insisting the tech is actually deterministic and we should just set a “seed” (a way to make the model repeat the same answer), asking why that option isn’t standard. Tool fans pile in with ironbound’s fix-it energy, dropping links to structured-output helpers like Outlines, Instructor, and Guardrails like they’re duct tape for AI. Then galaxyLogic throws a grenade: if AI writes your unit tests, and they’re different every time, are any of them “correct”? Memes fly—“LLM confidence = horoscope,” “AI needs a mood ring,” and “seed is the cheat code.” Translation for non-nerds: people want the robot to stop improvising and stick to the script, but the robot keeps jazz-handing.

Key Points

•LLM outputs are non-deterministic, complicating integration with deterministic software systems.
•A repository-scanning use case is presented to list domains addressed in code and navigate to high-domain-content areas.
•An OpenEMR Patient class code sample is used to query an LLM for domain identification.
•Structuring LLM responses in JSON improves integrability but does not guarantee consistent categories across runs.
•Repeated queries produce varying domain labels, hindering comparisons and hierarchical roll-ups; the topic leads into domain modeling and strategic design.

Hottest takes

"Querying an LLM for confidence is just vibes, not truth" — ramity

"Aren’t transformers intrinsically deterministic? Just set the seed" — an0malous

"If AI’s unit tests change every run, can any be ‘correct’?" — galaxyLogic

January 16, 2026

Chaos vs Checkboxes

Can we tame AI’s mood swings? Devs try, commenters roast

Key Points

Hottest takes

January 16, 2026

Chaos vs Checkboxes

How to wrangle non-deterministic AI outputs into conventional software? (2025)

Can we tame AI’s mood swings? Devs try, commenters roast

Key Points

Hottest takes

Save News