1
    Agent Loop - Autoresearch Optimizer

    Agent Loop - Autoresearch Optimizer

    by PubsProToolkit

    An iterative agent loop that optimizes any prompt, config, or artifact by making one change at a time, scoring it against a metric, and keeping only the winners.

    Updated Jun 2026
    Security scanned
    Codex CLI

    $19

    · or 95 credits

    30-day refund guarantee

    Secure checkout via Stripe

    Included in download

    • Improving a system prompt from good-enough to high-reliability
    • Tuning a data-extraction pipeline against test cases
    • terminal, file_read, file_write automation included
    • Ready for Codex CLI
    • Instant install

    Sample input

    Optimize my customer-support system prompt at ./target/support_prompt.md. Test inputs are in ./tests/. Score each variant with: did it stay under 120 words, did it include a next step, and did it avoid making promises about refunds. Run 40 rounds and keep the best version.

    Sample output

    Baseline score 4/9 checks passing. Ran 40 rounds, 31 edits reverted, 9 kept. Final score 9/9. Top winning changes: added an explicit word limit, moved the next-step instruction earlier, and replaced a vague refund line with a deferral to policy. Best prompt written back to ./target/support_prompt.md; full round-by-round log saved to ./target/loop_log.md.

    About This Skill

    This skill turns any agent into a relentless self-improving optimizer using the "agent loop" pattern popularized by recent autoresearch work. Instead of hand-tuning a prompt or config once and walking away at "good enough," the agent runs a tight loop: propose one change, test it against a defined metric, keep it if it beats the current best, revert it if it doesn't, and repeat. Over dozens or hundreds of cheap iterations, the artifact climbs steadily toward a much higher quality ceiling than manual iteration ever reaches. WHAT IT DOES The skill takes a target you want to improve (a system prompt, an extraction pipeline, a code-review instruction, a model config, or any artifact you can evaluate), a small set of realistic test inputs, and a handful of binary yes/no quality checks. It then drives the optimization loop autonomously: one edit per round, one score per round, winners kept and losers reverted, with a running log of what changed and why. WHY IT MATTERS Manual iteration hits diminishing returns fast because humans get tired and stop. An agent doesn't. The bottleneck flips from "can we run this experiment" to "do we even know what question to ask," which means your job becomes curating hypotheses and defining good metrics rather than grinding through trial and error. Any metric you care about that is reasonably cheap to evaluate becomes fair game for automated optimization. WHO IT'S FOR Anyone who maintains prompts or configs they rely on repeatedly: customer-support agents, internal workflow automations, data-extraction pipelines, code-review instructions, or ML training scripts. If you've ever written something, gotten it to "good enough," and moved on, this loop picks up exactly where you stopped.

    Use Cases

    • Improving a system prompt from good-enough to high-reliability
    • Tuning a data-extraction pipeline against test cases
    • Optimizing an ML training script or config overnight

    Reviews

    No reviews yet - be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    Terminal / Shell
    Read Files
    Write Files

    File Scopes

    ./target/**

    Shell access is used only to run the user-defined test/eval command each round. Read and write access is scoped to the target artifact being optimized and the experiment log.

    Model-agnostic. Works with any SKILL.md-compatible agent (Claude Code, Codex CLI, Cursor, Gemini CLI). Requires a target artifact, 2-3 test inputs, and an automatable scoring function or set of binary checks.

    Creator

    PubsProToolkit builds adversarial "gate" skills for AI agents — they catch problems before your output ships, instead of just generating more. From code, security, and infrastructure to content, hiring, contracts, and finance. Built by a CMPP-certified, PhD medical writer who brings regulated-industry rigor to every domain.

    Frequently Asked Questions

    More Premium Skills

    $19