April 2, 2026
Vowels vs vibes: Pushkin goes statistical
An Example of Statistical Investigation of the Text Eugene Onegin – Markov, 1913 [pdf]
Counting Pushkin’s vowels: math meets muse, fans split between romance and nerd joy
TLDR: A century-old study counted vowels and consonants in Pushkin’s “Eugene Onegin,” an early peek at how patterns power predictive text. Commenters split between swooning over timeless poetry, sharing a Veritasium video, and name‑dropping Hofstadter—debating whether math reveals the magic or risks flattening it.
A 1913 curveball just crashed the timeline: Russian mathematician Andrey Markov literally counted vowels and consonants in Pushkin’s “Eugene Onegin,” mapping how often one letter follows another—basically the grandparent of your phone’s predictive text. He even arranged the novel’s letters into neat 10×10 grids to spot patterns. High art meets cold stats, and the comment section went full duet.
One camp swoons. “Pushkin is eternal,” one fan declares, insisting that poetry survives math—and maybe even shines because of it. Others rush in with modern receipts, dropping a slick Veritasium explainer to prove this isn’t just dusty history but a building block of today’s AI. Then literature buffs add spice: Douglas Hofstadter once tried his own translation experiment of Onegin to test his theories—yes, that Hofstadter—cue a well-timed book link.
The playful drama? Some whisper “does counting kill the magic?” while others clap back: numbers are just another way to love a poem. Jokes fly about “Team Vowel vs Team Consonant,” “Eugene Algorithm,” and whether romance can be “statistically significant.” The vibe is clear: Markov didn’t ruin the music—he found a hidden beat, and the crowd can’t decide if they want to dance or graph it.
Key Points
- •Markov analyzes a 20,000-letter sequence from Pushkin’s Eugene Onegin, classifying each letter as vowel or consonant.
- •He defines and estimates probabilities p, p1, p0, and second-order probabilities p1,1, p1,0, p0,1, p0,0 for vowel occurrence given prior letters.
- •Opposing probabilities for consonants are denoted with q, following the same indexing scheme.
- •To estimate p, the sequence is split into 200 blocks of 100 letters; vowel proportions per block are averaged.
- •Blocks are arranged into 10×10 grids; column vowel counts are paired (1+6, 2+7, 3+8, 4+9, 5+10) to form new 100-letter groups across 500-letter aggregates, tabulated in forty small tables.