A Digital Compute-in-Memory Architecture for NFA Evaluation

Lightning-fast threat chip drops; commenters: “Cool demo, now run SNORT at 10G”

TLDR: Researchers built a chip to speed up pattern matching for network security while using very little energy, promising faster scanning with lower costs. Commenters love the ambition but demand real-world SNORT tests at 10‑gigabit speeds, debate the Bloom filter’s limits, and side-eye the “Open Access” page nudging a Premium paywall.

A new research chip claims it can scan for bad stuff in network traffic crazy fast by doing the work inside memory and skipping ahead when a quick yes/no test (a “Bloom filter”) says it can. Translation: pattern-matching speed without guzzling power—think fewer server racks and lower bills. Fans are hyped by numbers like 2.8 GB/s and whisper “femtojoules” like a magic spell. But the comments? Spicy.

Skeptics say it’s the oldest nerd fight: lab charts vs. messy reality. They want end-to-end results inside the popular SNORT threat detector, not just a slick chip graph. One chorus keeps repeating: “Show it handling full rule sets at real 10‑gigabit speeds with real traffic—or it’s marketing.” Others poke at the Bloom filter: neat bouncer, but what happens when it lets too many party crashers through? Meanwhile, hardware diehards argue whether this belongs in a custom chip, an FPGA (lego-like hardware), or a GPU (graphics muscle), with everyone dunking on everyone.

The meta-drama? The “Open Access” tease paired with a “Premium” paywall for the plain-language AI summary had commenters cackling. One user dropped a Substack summary, others memed “compute-in-memory” as “memory with gym membership,” and someone renamed NFA (a pattern engine) to “No Fun Anymore.” Classic internet science fair: big claims, bigger side-eye.

Key Points

  • Pattern evaluation in SNORT accounts for 53.6% of runtime, limiting throughput to 2.5 Gb/s with full community rules, below the 10 Gb/s target.
  • The paper introduces a digital compute-in-memory accelerator for arbitrary NFA evaluation, fabricated in 22 nm FD-SOI.
  • The accelerator achieves 2822 MB/s peak throughput at maximum frequency.
  • At the minimum energy point, the design reaches 406 MB/s with an energy cost of 1.27 fJ per byte per transition.
  • Efficiency is enabled by digital CIM macros and a CIM Bloom filter that gates activity to allow opportunistic symbol skipping.

Hottest takes

"Here is my summary: https://danglingpointers.substack.com/p/a-127-fjbtransition-digital-compute" — blakepelton
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.