Show HN: Gemini can now natively embed video, so I built sub-second video search

Type 'red truck', get the clip — HN cheers, skeptics squint

TLDR: A developer’s tool uses Google’s video-savvy AI to let you type a phrase and instantly pull the right dashcam clip, all open on GitHub. Commenters are split between excitement over creative uses (like home monitoring) and worries about reliability, usability, and cost, asking how it holds up when the AI isn’t confident.

A scrappy “Show HN” just lit up the thread: a dev built SentrySearch, a tool that lets you type “red truck running a stop sign” and get the exact dashcam clip back in seconds. It leans on Google’s new video-aware AI to “look” at footage and match it to your words. The repo is live at github.com/ssrajadh/sentrysearch.

The crowd reaction? Split and spicy. One camp is hyped on the possibilities — think finding that one moment in hours of footage — with users calling it “quite interesting” and dreaming up everything from traffic incidents to smarter cameras. Another camp wants receipts: reliability questions flew in fast, with folks pressing about what happens when the AI isn’t sure and whether there’s a confidence threshold. There’s also a mini-mix-up brewing: one commenter asked why not skip text entirely — a fun twist, since the whole trick here is comparing your words to the video itself without needing transcripts.

Practical heads chimed in with non-dashcam ideas (hello, home monitoring), while a few asked for real-world use cases and cost breakdowns (it’s about $2.50 to index an hour of video). The vibe swung between “future of search” and “but will it actually work when it counts?” — with a side of playful CSI “enhance!” energy from the peanut gallery.

Key Points

  • SentrySearch performs semantic search over dashcam videos by embedding overlapping 30s chunks using Google’s Gemini Embedding 2 and storing them in ChromaDB.
  • Text queries are embedded into the same 768-dimensional space and matched to video embeddings, enabling direct video–text comparison without transcription.
  • The tool provides CLI commands for init, indexing .mp4 directories, and searching; top matches are auto-trimmed into clips using ffmpeg.
  • Indexing cost is estimated at ~$2.50 per hour of footage at default settings; search costs are minimal as they only embed text.
  • Limitations include cost inefficiency from embedding all chunks, sensitivity to chunk boundaries, and reliance on a preview API whose behavior and pricing may change.

Hottest takes

"open the door to quite many potential applications!" — ygouzerh
"cases where Gemini's response confidence is low? Do you have a fallback or threshold?" — dev_tools_lab
"why not skip the text conversion? is it usable at all?" — klntsky
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.