February 10, 2026
Checklists vs Chaos
Show HN: Open-Source SDK for AI Knowledge Work
Can an AI intern grade its own homework? HN says: prove it
TLDR: An open-source SDK uses checklists (“rubrics”) so AI can verify its own research and writing before submitting. The community is split between excitement for structured outputs and snark that it’s just “checklists with GPT,” pushing for benchmarks, real-world case studies, and proof it won’t just game the rules.
The latest “Show HN” drops an open-source toolkit that promises AI agents can do real research and strategy work—then check themselves before turning it in. The trick? A pre-written checklist (a “rubric”) that defines what “good” looks like, then a loop that keeps fixing the work until it passes. Fans say it’s the missing piece for non-coding tasks, calling it a way to turn messy reports into repeatable outcomes. Skeptics roll in fast with memes like “prompt engineering in a trench coat” and “AI intern grading its own homework,” demanding proof it beats vibes and buzzwords.
The comments buzz around whether rubrics can be gamed, with one camp warning of “compliance theater” and another cheering that audits finally get receipts. Privacy hawks side-eye the required keys to OpenAI, Anthropic, and Gemini, while consultants gleefully eye faster market analyses and strategy decks. Benchmark hunters ask for head-to-head tests; pragmatists want case studies beyond toy tasks. Even with the repo open, a spicy thread asks if the hidden rubric makes it more fair—or just easier for the agent to overfit. The mood: intrigued, skeptical, and highly entertained. As one fan put it, “hold my rubric” while the loop tries not to turn research into plausible nonsense.
Key Points
- •An open-source Python SDK (“Knowledge Work SDK”) enables AI agents to perform knowledge work with structured verification.
- •The SDK uses rubric-based criteria to define quality, enabling self-verification, iterative improvement, and human-auditable evaluation.
- •A self-verifying agentic loop includes brief creation, rubric creation, task execution, verification, iteration on failure, and submission on pass.
- •Agents can search the web, handle files, execute code, generate artifacts, and request user clarification, coordinated by an orchestrator.
- •The project originated as an RL training harness and supports Gemini, OpenAI, and Anthropic providers, with installation and quick-start instructions provided.