Can you save on LLM tokens using images instead of text?

Turning words into pictures to cut AI costs? Commenters say maybe, meh, and nope

TLDR: Turning text into images can shrink prompt tokens (about 40% on one model) but it runs slower and often inflates the pricier output tokens, erasing savings. Comments joked about “a picture is worth a thousand words,” asked why completions spike, and warned images bog LLMs down.

An experiment tried a cheeky hack: turn long text into images, feed those pics to an AI, and pay less. In plain English, AI billing is measured in “tokens” (tiny chunks of text), and the test found image prompts cut the input count by about 40% on one model. Cue the thread lighting up with jokes, eyerolls, and a few “aha!” moments. One commenter dusted off the classic line—“Is a picture worth a thousand words?”—and linked the old saying like it’s brand-new science (A picture is worth a thousand words). Others weren’t buying it, noting the trade-offs: images took longer and the AI often spit out more output text, which is pricier.

The drama split the crowd. The “pics are cheaper” camp cheered the token savings and a win for one chat model. The skeptics argued it’s a mirage: you spend time wrangling images (perfect size, split into two, “high detail”), then watch the meter spin faster on the completion side. “Why does the output get longer if the result looks the same?” became the mystery of the day. And the performance gripes rolled in: images made the AI crawl. Meme-y refrain of the thread: Screenshots save pennies, burn minutes. Verdict vibes: clever trick, but probably not your wallet’s hero.

Key Points

  • The experiment compared text-only prompts to image-based prompts for the same task using the OpenAI API.
  • Images were prepared at 768x768 resolution with “detail: high” to maintain text readability and avoid resizing issues.
  • Image-based prompts produced similar outputs but took nearly twice as long to process.
  • Prompt tokens decreased significantly with images for some models, notably over 40% for gpt-5.
  • Completion tokens increased for image inputs on all tested models except gpt-5-chat, often negating cost savings due to higher completion token prices.

Hottest takes

“Does this mean we’ll finally get empirical proof for the aphorism ‘a picture is worth a thousand words’?” — bikeshaving
“Why are completion tokens more with image prompts yet the text output was about the same?” — floodfx
“In my experience, LLMs tend to take noticeably longer to process images than text.” — ashed96
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.