April 20, 2026
Is your AI catfishing you?
Kimi vendor verifier – verify accuracy of inference providers
Kimi’s AI “truth test” wins applause—plus side‑eye over cheats and a spicy AWS callout
TLDR: Kimi launched a tool to verify whether vendors run its models correctly and promises a public scorecard. The crowd cheers the accountability push, but some say it won’t catch deliberate fakers, the 15‑hour test is heavy, and an unverified AWS jab adds extra drama.
Is your AI catfishing you? Kimi just dropped the Kimi Vendor Verifier (KVV), a free “truth test” to check if cloud vendors are actually running its models correctly—and the comments are on fire. Fans say this is the missing piece of open‑source: not just releasing a model, but making sure it doesn’t get mangled in the wild. One user cheered that providers “quietly swap” cheaper settings and most folks never notice.
Kimi’s fix? A battery of six tests—from quick image checks to a brutal long‑form exam—that aim to catch sloppy setups and broken features. Plus, they’ve locked some “randomness knobs” (like temperature) in thinking mode, so providers can’t fudge the vibes. There’s even talk of a public leaderboard and official results to keep everyone honest.
But the drama? It’s spicy. Skeptics say KVV catches accidents, not bad actors: “Sketchy Provider” could still pretend to run the good stuff while pocketing the difference. Others groan that a 15‑hour test on high‑end gear isn’t exactly DIY, joking it’s a “bring a lunch” benchmark. And one commenter lobbed an unverified grenade at AWS Bedrock, claiming tool calls sometimes “silently end” conversations—cue popcorn. Love it or side‑eye it, the community wants a scoreboard and receipts.
Key Points
- •Kimi open-sourced the Kimi Vendor Verifier (KVV) with the release of its K2.6 model to validate third-party inference correctness.
- •KVV includes six targeted evaluations (Pre-Verification, OCRBench, MMMU Pro, AIME2025, K2VV ToolCall, SWE-Bench) to surface infrastructure-related failures.
- •Initial issues were traced to decoding parameter misuse; Kimi now enforces Temperature=1.0 and TopP=0.95 in Thinking mode with validation of returned thinking content.
- •Kimi will provide pre-release validation, maintain a public vendor leaderboard, and collaborate with vLLM, SGLang, and KTransformers to upstream fixes.
- •Full evaluation runs took ~15 hours on two NVIDIA H20 8-GPU servers; scripts support streaming, automatic retries, and checkpoint resumption.