OpenAI Privacy Filter

A tiny privacy scrubber you can run at home — “Open” returns, and devs are into it

TLDR: OpenAI released an Apache-licensed, open‑weight model that redacts personal info locally so apps don’t ship sensitive data to servers. Devs applauded the practical, lightweight tool, joked that “Open” is back, and immediately asked for a matching, equally tiny defense against prompt-injection tricks.

OpenAI just dropped an “open-weights” Privacy Filter that spots and hides personal info—think names, addresses, credit cards, even passwords—without your data ever leaving your device. It’s licensed Apache 2.0 and posted on Hugging Face and GitHub, and the comments wasted no time turning it into a vibes check for OpenAI’s “open” cred. One user cheered it as “a very straightforward and useful thing,” while another practically pointed at the download button: it’s open weights, you can run it on your own hardware. Cue the meme: “Bringing back the Open to OpenAI..” became the rallying cry.

Beyond the applause, the nerds dove in. Fans highlighted that it handles long texts fast in a single pass, and it’s small but sharp—lightweight enough (50M “active” parameters) to run locally, yet smart enough to use context, not just dumb pattern matching. The hottest follow-up? A call for a similarly light tool to fight “prompt injection” tricks, with one commenter impressed by the size but already asking, “what’s next?” In classic dev fashion, the community is celebrating a practical tool that upgrades privacy by default—while egging OpenAI on to keep the “open” flowing and ship the defensive twin next.

Key Points

  • OpenAI released Privacy Filter, an open‑weight, locally runnable model for detecting and redacting PII in text.
  • The model supports up to 128,000 tokens, labels tokens in a single pass, and allows precision/recall tuning.
  • It achieves state‑of‑the‑art performance on the PII‑Masking‑300k benchmark when corrected for annotation issues.
  • Architecture: bidirectional token classification with span decoding, constrained Viterbi decoding, and BIOES tags.
  • The released model has 1.5B total parameters with 50M active, and can be fine‑tuned and integrated into AI data pipelines.

Hottest takes

“Bringing back the Open to OpenAI..” — 7777777phil
“this is an open weights model. You can run it on your own hardware.” — klauserc
“50M effective parameters is impressively light” — Havoc
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.