Hamilton-Jacobi-Bellman Equation: Reinforcement Learning and Diffusion Models

AI’s secret sauce is 1840s math and devs are spiraling

TLDR: The post ties a classic physics-style equation to how AI learns and how image models work. Comments explode into a fight over pristine math vs real computers, with control vets cheering and some devs fearing they’ll be left behind—why it matters: the future of AI work feels up for grabs.

A new post claims modern AI is running on centuries-old math, connecting Bellman’s decision rule to its continuous-time cousin, the Hamilton-Jacobi-Bellman equation, and even to diffusion models—the stuff behind image generators. Translation: the author says one elegant equation helps explain how machines choose the “best next move,” whether in learning or generating. The comments? Absolute wildfire.

On one side, skeptics like measurablefunc are yelling, “Wait—continuous math on digital computers?” and launching the “infinite-precision vs. real-world bits” debate. It’s the Float32 Gang vs. Platonic Real Numbers showdown, complete with snark about math papers airbrushing away rounding errors. Meanwhile, control-theory veterans like Cloudly show up like proud alumni: “We’ve been doing this since undergrad,” flexing that old-school engineering has been the quiet backbone of the AI boom.

Then came the existential gut-punch: software engineer lain98 admits feeling “completely outclassed by mathematicians,” wondering if software as a career lasts five more years. That sparked a support pile-on, memes about the “ice trade” (are coders the ice men pre-refrigerators?), and pep talks that tools still need builders. Verdict: the post is a math love letter, but the thread is a therapy session—equal parts panic, pride, and spicy philosophy about whether pristine equations survive contact with messy machines. Read the piece here.

Key Points

  • Bellman’s discrete-time dynamic programming leads to the Bellman equation for MDPs and value functions.
  • Passing to continuous time yields the Hamilton–Jacobi–Bellman (HJB) PDE as the optimality condition.
  • The article proves the deterministic HJB via the dynamic programming principle and first-order expansions.
  • The Hamiltonian is defined as a supremum over actions of reward plus gradient–dynamics inner product.
  • The framework connects continuous-time RL with stochastic control, diffusion models, and optimal transport; neural policy iteration is suggested for solution.

Hottest takes

"It's not clear or obvious why continuous semantics should be applicable on a digital computer" — measurablefunc
"Ever since the control bug bit me in my EE undergrad years I am happy to see how useful the knowledge remains" — Cloudly
"I am unsure of the next course of action or if software will survive another 5 years" — lain98
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.