May 20, 2026

Transcript Wars: CPU Strikes Back

Show HN: CPU-only transcription for YouTube, TikTok, X, Instagram videos

A tiny video-to-text tool drops, and the comments instantly split into hype, shrugs, and startup dreams

TLDR: yapsnap is a simple tool that turns video links or audio files into text on your own computer, with no paid service required. Commenters immediately split between "this is a handy privacy-friendly shortcut," "sites already do this," and "wait, is this secretly a startup?"

A new little tool called yapsnap just strutted onto Show HN with a big promise: paste in a YouTube, TikTok, X, or Instagram link, and your computer turns the audio into plain text without sending it to the internet after the first setup. No expensive hardware, no paid account, no cloud middleman — just your own machine doing the work. For privacy fans and transcript hoarders, that alone was enough to get people leaning forward.

But the real show was in the comments, where the crowd immediately split into camps. One side basically said, "Cute, but this is mostly glue" — noting that the project is a slim wrapper around existing tools. Not exactly an insult, but definitely a raised eyebrow. Another camp shrugged even harder: why bother when many sites already offer captions? That sparked the obvious comeback: built-in captions only help when they exist, and this thing works on videos that don’t have them.

Then came the optimists, and they came in hot. One commenter had an AI assistant test it on three videos and reported blazing-fast results, instantly spinning that into a bigger fantasy: searchable video knowledge, automatic summaries, maybe even a business. Yes, the startup sirens were wailing. Meanwhile, the practical crowd jumped in with the classic feature requests: other languages? speaker labels? In other words, the community reaction was peak internet: one person says it's a neat hack, another says it’s unnecessary, and a third is already building a company in their head.

Key Points

  • yapsnap is a CPU-only CLI tool that transcribes supported video URLs and local media files into plaintext without requiring GPU hardware or cloud APIs.
  • The tool uses yt-dlp to fetch remote media, ffmpeg to decode audio to 16 kHz mono PCM, and a streaming Kroko English Zipformer/Zipformer2 INT8 ONNX model for recognition.
  • The model download is approximately 80 MB, cached locally on first run, and subsequent transcription runs are described as offline.
  • Supported sources include YouTube, YouTube Shorts, X/Twitter, TikTok, Instagram Reels, direct media URLs, and local files that ffmpeg can decode.
  • Optional features include sentence-level timestamps, adjustable pre-transcription speed with preserved pitch, custom output paths, keeping downloaded audio, and overriding the model directory.

Hottest takes

"glues yt-dlp and Kroko together. Neat." — spudlyo
"Most of these platforms already have transcriptions built in." — charcircuit
"There is a viable product in here somewhere" — niraj-agarwal
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.