Various locale mismatch scenarios in Windows clipboard text format synthesis

Windows clipboard sparks ‘copy wars’ as devs rage at ancient text rules while Raymond Chen teases a mystery

TLDR: Raymond Chen explains why copy‑paste text can break when old 8‑bit formats clash with modern Unicode, especially across different app settings. Devs clap back: ditch legacy code pages and go all‑Unicode, while marveling that Chen teased a rare mystery—because garbled text still ruins real users’ day.

Microsoft’s clipboard just turned into a soap opera, and the audience is yelling at the screen. In his latest post, Windows guru Raymond Chen explains how copying text can go sideways when old-school “ANSI” (think 1990s 8‑bit text) collides with modern Unicode. Example: copy Hebrew on a U.S. system and it can turn into question marks if an app insists on the old format. The blog dives into locale IDs (basically, “what language is this?” tags) and how Windows picks a code page, with a twist: newer settings let different apps disagree on what “ANSI” even means, which is a recipe for chaos.

The crowd reaction? Spicy. One commenter slammed the whole setup as “OEM LCID 1252 ANSI nonsense,” basically begging Microsoft to make Windows go full Unicode already. Another gasped that Chen is teasing a mystery he didn’t immediately solve—yes, the man who knows everything Windows admitted there’s a quirk to unravel next time. Cue popcorn. The memes wrote themselves: “Question Marks Apocalypse,” “CF_LOCALE is the Da Vinci Code,” and “It’s 2025, why are we still debugging 1995?”

Bottom line: the article is a careful explainer, but the community came to riot—half nostalgic, half furious, all entertained as Chen hints at a cliffhanger worthy of binge‑watching the next blog post.

Key Points

  • CF_LOCALE, derived from the active keyboard layout, guides conversions between Unicode and 8-bit ANSI/OEM clipboard formats.
  • If text is placed and retrieved as CF_UNICODETEXT, no conversion occurs and LCID is irrelevant to data integrity.
  • When Unicode text is retrieved as CF_TEXT, Windows converts it using the ANSI code page implied by CF_LOCALE (e.g., US-English → CP1252).
  • Hebrew characters cannot be represented in CP1252; copying Unicode Hebrew to ANSI CF_TEXT on US-English systems results in loss (e.g., question marks).
  • ANSI-to-ANSI clipboard operations perform no conversion; historically a single desktop-wide ANSI code page avoided mismatches, but activeCodePage now allows per-app differences.

Hottest takes

remove all this OEM LCID 1252 ANSI nonsense from computing (well, just Windows) — akersten
Whoa there exists something Raymond Chen didn’t know about Windows core APIs? — jey
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.