Show HN: Per-instance TSP Solver with No Pre-training (1.66% gap on d1291)

AI Salesman learns one route at a time — and the comments go wild

TLDR: A new open-source solver learns each map from scratch and hit 1.66% off optimal in 5.6 hours on a single GPU. The crowd is split: some praise the no-pretraining twist, others ask how it stacks up against classic solvers and whether the compute cost makes sense.

An indie coder dropped a bold claim: a reinforcement learning “AI salesman” that learns a single map from scratch, no study guide, and lands just 1.66% shy of perfect on a classic test (d1291) in about 5.6 hours on one beefy GPU. Cue the chaos. Some cheered the “no pre-training” flex as fresh science, while others immediately asked the only question that matters: does it beat the old-school champs that already solve this fast and cheap? One helpful explainer jumped in to translate the alphabet soup — TSP is the famous route-planning puzzle; PPO is a training method for agents — calming the “wait, what is this?” crowd.

The hot debate: research novelty vs real-world value. Fans loved the “exception edges” idea — spots where the best route breaks away from obvious nearest-neighbor choices — calling it a clever way to guide the agent without a massive dataset. Skeptics fired back that 5.6 hours per map sounds like a lot for “almost best,” and demanded head-to-heads with the usual suspects. Jokes flew too: “AI salesman picking the scenic route,” “exception edges = boss fight,” and “overnight shipping is faster.”

Still, the code is out there, with a Colab to poke at the hypothesis, and the thread vibe is classic: pioneers vs pragmatists, each convinced they’re right. Repo

Key Points

  • A per-instance TSP solver using PPO learns from scratch without any pre-training.
  • Achieved a 1.66% gap on TSPLIB d1291 after ~5.6 hours of training on a single A100 GPU.
  • Method relies on an inductive bias highlighting geometric/topological structures of 'exception edges.'
  • Agent receives guidance on promising edges and uses PPO to refine solutions via trial and error.
  • Code and a Colab notebook are open-sourced for verification and experimentation.

Hottest takes

"TSP = Travelling Salesman Problem" — mkl
"It achieved a 1.66% gap on TSPLIB d1291" — OP
"RL reach this level without a dataset" — OP
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.