March 17, 2026
Robots vs coders: bring popcorn
Grace Hopper's Revenge
Elixir crowned by AI tests while Python and JavaScript get dragged — comment war ignites
TLDR: An essay claims AI coding tools score best with structured languages like Elixir and C#, not Python or JavaScript, arguing design matters more than data. Commenters split between cheering audit-friendly code and blasting the tiny dataset and AI hype, demanding proof from projects—why it matters: tools shape how we build.
Greg Olsen’s hot take lit up the feed: AI coding tests say languages like Elixir, C#, and Kotlin come out on top, while Python and JavaScript stumble. He points to AutoCodeBench, a multi-language test, versus Python-centric tests like SWEBench and TerminalBench. His punchline: language design beats training data; functional, more structured styles “flow” with large language models (LLMs, the chatbots that write code). He even drops a Tesla-style analogy: build for the world we already have—verification over creation, structure over cleverness. Translation for non-nerds: the simpler and more checkable the code, the better machines can help.
And the comments? Pure spice. Skeptics rolled in fast. skywhopper blasted the “tiny, questionable bit of data” and called out LLM triumphalism. stabbles cut to the chase: “code should be easy to audit,” not just easy to spit out. ashirviskas likes Elixir’s win but doesn’t buy the cause. Chris2048 joked that “objects” are basically “just dict-based organization.” keybored mocked the hype with a “five gazillion tokens” burn and a jab at “Flintstone Engineering.” Meanwhile, Elixir fans did a quiet victory lap as Python/JavaScript devs clutched their coffee. The real fight: are we redesigning our tools for robots to check them—or for humans to read them? Either way, the benchmarks just became the battleground.
Key Points
- •The article links Kernighan’s Law to language design, arguing LLM-era coding effectiveness depends on language structure.
- •Common benchmarks (SWEBench, TerminalBench) are Python-centric, which may not reflect cross-language performance.
- •AutoCodeBench, spanning 20 languages, reportedly ranks Elixir, Kotlin, Racket, and C# highest and PHP, JavaScript, Python, and Perl lowest across models.
- •The author contends that training data volume is less predictive of LLM coding performance than language structure and functional paradigms.
- •An analogy to Tesla’s vision-first approach and humanoid robots suggests aligning tools with existing infrastructure; the author begins to apply this idea to software interfaces.