Show HN: I taught GPT-OSS-120B to see using Google Lens and OpenCV

Dev 'teaches' a blind AI to see with Google Lens—clever hack or just cheating?

TLDR: A dev glued Google Search and Google Lens onto a local AI so it can “see” and name objects, no paid APIs required. Commenters are split between calling it a clever tool‑chain and accusing it of outsourcing the brains to Google, raising TOS, CAPTCHA, and “just use Llama/Gemini” debates.

In true Show HN theater, one dev claims he made a text-only AI “see” by bolting on Google Search and Google Lens—no paid keys, just a scrappy browser bot. The star demo: GPT-OSS-120B correctly naming an NVIDIA DGX Spark and a SanDisk USB from a desk pic. How? It sliced the image into pieces using OpenCV (a popular image tool) and sent each crop to Google Lens for IDs. GitHub and PyPI are live, boasting 17 tools from Maps to News to Flights.

Then the comments cannon fired. Team Magic cheered the hustle and local-first approach: keep your AI on your own machine, give it real-world superpowers. Team “It’s Cheating” cried foul: “Wasn’t it Google Lens doing the seeing?” One critic went full analogy-mode: it’s like bragging your 5‑year‑old can do calculus—because you typed homework into Wolfram Alpha. Legal eagles warned about TOS (terms of service) landmines and the fragility of scraping, while another dev groaned that CAPTCHAs (the “are you human?” tests) will smack this down. Others chimed in: why not just use Llama or call Gemini and skip the theatrics?

Fans snapped back: integrating tools is the real skill, and humans outsource tasks all the time. Detractors rolled their eyes: cool demo, but the “eyes” clearly belong to Google. The vibe? Equal parts applause, side‑eye, and meme-fueled “Booyah!”

Key Points

  • An MCP server was released to give local LLMs Google search and pseudo-vision capabilities without API keys.
  • The google_lens_detect feature uses OpenCV to detect and crop objects, then sends them to Google Lens for identification.
  • A demo showed GPT-OSS-120B (text-only) identifying an NVIDIA DGX Spark and a SanDisk USB drive from a desk photo.
  • The tool suite integrates 17 Google services, including Search, News, Shopping, Scholar, Maps, Finance, Weather, Flights, Hotels, Translate, Images, and Trends.
  • Setup involves installing the noapi-google-search-mcp package and running Playwright to install Chromium; links to GitHub and PyPI are provided.

Hottest takes

"Looks like a TOS violation to me" — N_Lens
"But wasn't it Google Lens that actually identified them?" — magic_hamster
"I taught my 5 year old to calculate integrals, by typing them into Wolfram Alpha" — l1am0
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.