December 18, 2025
Benchmarks or it didn’t happen
GPT-5.2-Codex
OpenAI drops its 'coding superbrain'—fans demand receipts, Claude loyalists shrug
TLDR: OpenAI launched GPT‑5.2‑Codex, a souped-up coding assistant with better long‑project handling, Windows support, and stronger security. The crowd is split: some demand hard benchmarks vs Claude/Gemini and complain it’s slow, others hope it finally brings real competition, while safety invites spark dual‑use debates.
OpenAI just launched GPT‑5.2‑Codex, pitching it as the most advanced “agentic” coding assistant for real-world software. Translation: it’s supposed to handle long, messy projects better, keep track across big code changes, finally play nice with Windows, and flex upgraded cybersecurity. They say it’s not at the highest danger level in their safety system (the Preparedness Framework), but it’s getting stronger and comes with extra safeguards. Paid ChatGPT folks get it now; API access is “coming soon,” with invite-only, more powerful flavors for vetted security pros.
The comments? Spicy. One camp is yelling “Show us the receipts!” and wants head-to-head numbers vs Google’s Gemini and Anthropic’s Claude. Another camp is over the hype: “0 enthusiasm” because recent OpenAI “thinking” models feel slow—cue the 🐌 memes. Ex‑Codex fans confess they’ve switched to Claude Code and want OpenAI to catch up so there’s real competition. On safety, some applaud the invite-only approach—there’s a fine line between tools that help defenders and ones bad actors could misuse. Others ask what “dual-use” means: simple—skills that can secure systems can also be abused to break them. Bonus drama: OpenAI touts that a researcher used an earlier model to find a React bug, but skeptics clap back with benchmarks-or-it-didn’t-happen energy. It’s receipts vs faith, speed vs smarts, and Windows devs finally feeling seen.
Key Points
- •GPT‑5.2-Codex is released as an advanced agentic coding model optimized for long-horizon software engineering tasks.
- •The model improves context compaction, large code changes (refactors, migrations), Windows environment reliability, and vision for UI/diagram interpretation.
- •It achieves state-of-the-art results on SWE-Bench Pro and Terminal-Bench 2.0 benchmarks.
- •Cybersecurity capabilities are significantly strengthened, with added safeguards; it does not yet reach 'High' capability under the Preparedness Framework.
- •Availability starts for paid ChatGPT users across Codex surfaces, with API access planned and invite-only trusted access for vetted defensive cybersecurity users.