Agent Optimization & Output-Quality Suite — Tune Your Agent, Trust Your Evals, Ship Stronger Output
A tight loop for making AI agent output measurably better and proving it. Agent Loop iteratively optimizes any prompt, config, or artifact - changing one thing at a time, scoring it against your metric, and keeping only the winners. The AI Eval & Test-Suite Quality Gate makes sure the metric you optimize against is trustworthy, catching gameable criteria, data leakage, and missing edge cases before they mislead you. And the Peer-Review Stress Test turns your agent into its own harshest reviewer, hunting weak claims and missing limitations before a human sees the output. Built for prompt engineers, AI builders, and teams shipping agent-powered features. Optimize it, trust the score, then stress-test it.
You save $7 vs buying individually.
What's included (3 skills)
An iterative agent loop that optimizes any prompt, config, or artifact by making one change at a time, scoring it against a metric, and keeping only the winners.
An adversarial gate that audits an AI eval or test suite — LLM-judge rubrics, datasets, regression tests, metrics — for gameable criteria, data leakage, missing edge cases, and non-determinism, then returns one PASS/REVISE/FAIL verdict.
An adversarial self-review gate that hunts your agent's weakest claim, overclaims, and missing limitations before a human sees the output.