October 29, 2025
Merge wars, token tears
A Year of Fast Apply – Our Path to 10k Tokens per Second
Open-source speed brag sparks “Cursor clone?” jokes and benchmark brawls
TLDR: Relace is sharing how it built a small AI that applies code changes insanely fast, claiming 10k tokens per second. The crowd is split: fans love the tooling and openness, while skeptics demand real benchmarks, question safety of AI merges, and poke at Cursor comparisons—speed vs trust is the fight
Relace just dropped a big brag: its new Apply 3 model can slam through code changes at 10k+ tokens per second, and they’re open‑sourcing the playbook behind it. Translation for non‑devs: rather than retyping your whole file, their small AI “merger” reads the “diff” (a list of changes) and patches only what’s needed. Cue the community chaos. Fans cheered the practical vibes. One standout, swyx, applauded their internal eval tools, calling them the best case of “AI accelerating AI,” and immediately asked how this stacks up against MorphLLM—aka, the meta debate of tools that build tools. But skeptics hit back: “10k tokens/sec” sounds like a speed run, yet folks want wall‑clock times, real costs, and failure cases, not just vibes. Some accused Relace of “Cursor‑but-open,” since Cursor popularized lazy diffs but kept their model inside the IDE. Others argued that letting a large language model (an AI that writes text) do the merge is bold but risky—one wrong guess and you ship a gremlin. Jokes flew fast: “Fast Apply is LinkedIn’s Easy Apply for code,” “git diff—but make it vibes,” and finance bros asking if “TPS” means transactions per second. The drama: Is this truly open, is it safe for agents with no human in the loop, and does speed still matter if accuracy slips? The thread is half claps, half side‑eye, 100% spicy
Key Points
- •Relace is open-sourcing learnings on dataset curation, training, and inference for fast apply models used in code diff merging.
- •Relace Apply 3 reaches 10k+ tokens per second while maintaining state-of-the-art accuracy.
- •Full-file regeneration with frontier LLMs (e.g., Claude 4.5 Sonnet) is slow and costly (~100+ seconds and ≥$0.18 for ~10k tokens).
- •Relace’s approach splits tasks: frontier models generate minimal diffs, and a small, fine-tuned LLM serves as the merge algorithm to handle edge cases.
- •Training focuses on high-quality, diverse datasets (initial_code, diff, merged_code), with diminishing returns beyond ~100k examples; first model used ~30k.