Show HN: Spam classifier in Go using Naive Bayes

Go dev drops a spam detector, community yells: “License, please”

TLDR: A Go-based spam detector using simple word-counting math landed on Hacker News. The thread quickly pivoted to a license showdown, with a Paul Graham throwback and a Perl veteran chiming in—community loves the retro vibe but won’t touch it without a clear open-source license.

A developer just rolled out nspammer, a simple spam detector in Go that uses Naive Bayes—think “count the words, make a smart guess”—plus a tiny cushion called Laplace smoothing so new words don’t break it. It’s got real email dataset tests, a plug‑and‑play API, and a demo that screams “buy now” equals spam. But the post instantly turned into a nostalgia-and-drama cocktail.

First up, a commenter drops the classic Paul Graham essay like a mic on stage, summoning the godfather of spam filtering and setting the tone: old school rules still apply. Then cipherself strides in with a flex: they built the same thing in Perl “12 (13?) years ago,” reminiscing about log math tricks and a wish-list for vectorization. Translation: this is solid, but we’ve seen this movie before. And just when the code talk warms up, leetrout slams the brakes with a community wake-up call: where’s the license? Without it, can anyone use this at all?

Cue jokes about “ham vs spam” and people teasing that every message containing “buy” is doomed. The vibe? A wholesome throwback project, showered in classic references, but the loudest chorus is open source needs a license. Old-school wisdom, new-school Go, and a dash of drama—just how HN likes it.

Key Points

  • A Naive Bayes spam classifier named nspammer is implemented in Go.
  • It uses Laplace smoothing (default α=1.0) to handle unseen words and avoid zero probabilities.
  • The API provides `NewSpamClassifier` for training and `Classify` for determining spam vs. non-spam.
  • Classification uses log probabilities to prevent numerical underflow and compares class scores.
  • The project supports the Kaggle Spam Mails Dataset via `./init.sh` and includes tests with accuracy evaluations.

Hottest takes

"https://www.paulgraham.com/better.html" — esafak
"12 (13?) years ago I had also written a Naïve Bayes classifier in Perl" — cipherself
"Could you put a license on it so we know how it can be used?" — leetrout
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.