November 4, 2025
When AI sees eyes in your code
Visual Features Across Modalities: SVG and ASCII Art Cross-Modal Understanding
AI spots eyes in ASCII and SVG — the internet can’t decide if it’s genius or pareidolia
TLDR: Researchers showed AI can spot “eyes” and other parts in ASCII art and SVG code and even flip frowns to smiles. Commenters split between “this is understanding” and “just pareidolia,” with one pushing that top models should actually draw what they claim to grasp.
A new study says AI can “see” eyes, mouths, and even dogs across text-only pictures, from old-school ASCII art (images made with keyboard characters) to SVG code (the instructions that draw shapes). It can even flip an ASCII frown to a smile by nudging the right internal signals. Cue the comments section turning into a meme-fueled battleground. One crowd is shouting “this is real understanding!” while skeptics clap back with “it’s just fancy pattern-matching and face pareidolia” (that human habit of seeing faces in clouds).
Developer-cred flexes came fast. User robot-wrangler dropped the bar: if an AI understands, it should draw and edit diagrams in Mermaid, SVG, or CSS on command—no excuses. Others joked the model now has better art class notes than they do: “Next up, AI critiques my stick figures.” The SVG vs. ASCII mini-war flared, with code purists declaring “real devs sketch in CSS,” while artists spammed the thread with increasingly cursed text faces: ( ͡👀 ͡). A spicy side debate raged over context: fans loved that the “eye” only triggers when the rest looks like a face; critics called that a magic trick, not mind-reading. The vibe? Half stunned, half skeptical, 100% entertained.
Key Points
- •LLMs exhibit cross-modal features that detect visual-semantic concepts (e.g., eyes, mouths, animals) across ASCII art, SVG code, and prose.
- •These features were identified via sparse autoencoders trained on a middle layer in models ranging from Haiku 3.5 to Sonnet 4.5.
- •Feature activations depend on context (e.g., established ASCII structure, prior SVG elements indicating a face) and are sensitive to cues like line lengths, colors, and SVG dimensions.
- •Steering subsets of identified features during generation can semantically edit text-based art (e.g., turning ASCII frowns to smiles, adding wrinkles to SVG faces).
- •Features are robust to superficial attributes (e.g., color, radius) but sensitive to element ordering; insufficient prior context reduces activation (e.g., moving an eye circle to the top of SVG).