
Agent Loop - Autoresearch Optimizer
An iterative agent loop that optimizes any prompt, config, or artifact by making one change at a time, scoring it against a metric, and keeping only the winners.
- Improving a system prompt from good-enough to high-reliability
- Tuning a data-extraction pipeline against test cases
- Optimizing an ML training script or config overnight
$19
· or 95 creditsSecure checkout via Stripe
Included in download
- Improving a system prompt from good-enough to high-reliability
- Tuning a data-extraction pipeline against test cases
- terminal, file_read, file_write automation included
- Ready for Codex CLI
Sample input
Optimize my customer-support system prompt at ./target/support_prompt.md. Test inputs are in ./tests/. Score each variant with: did it stay under 120 words, did it include a next step, and did it avoid making promises about refunds. Run 40 rounds and keep the best version.
Sample output
Baseline score 4/9 checks passing. Ran 40 rounds, 31 edits reverted, 9 kept. Final score 9/9. Top winning changes: added an explicit word limit, moved the next-step instruction earlier, and replaced a vague refund line with a deferral to policy. Best prompt written back to ./target/support_prompt.md; full round-by-round log saved to ./target/loop_log.md.
An iterative agent loop that optimizes any prompt, config, or artifact by making one change at a time, scoring it against a metric, and keeping only the winners.
$19
· or 95 creditsSecure checkout via Stripe
Included in download
- Improving a system prompt from good-enough to high-reliability
- Tuning a data-extraction pipeline against test cases
- terminal, file_read, file_write automation included
- Ready for Codex CLI
- Instant install
Sample input
Optimize my customer-support system prompt at ./target/support_prompt.md. Test inputs are in ./tests/. Score each variant with: did it stay under 120 words, did it include a next step, and did it avoid making promises about refunds. Run 40 rounds and keep the best version.
Sample output
Baseline score 4/9 checks passing. Ran 40 rounds, 31 edits reverted, 9 kept. Final score 9/9. Top winning changes: added an explicit word limit, moved the next-step instruction earlier, and replaced a vague refund line with a deferral to policy. Best prompt written back to ./target/support_prompt.md; full round-by-round log saved to ./target/loop_log.md.
About This Skill
This skill turns any agent into a relentless self-improving optimizer using the "agent loop" pattern popularized by recent autoresearch work. Instead of hand-tuning a prompt or config once and walking away at "good enough," the agent runs a tight loop: propose one change, test it against a defined metric, keep it if it beats the current best, revert it if it doesn't, and repeat. Over dozens or hundreds of cheap iterations, the artifact climbs steadily toward a much higher quality ceiling than manual iteration ever reaches. WHAT IT DOES The skill takes a target you want to improve (a system prompt, an extraction pipeline, a code-review instruction, a model config, or any artifact you can evaluate), a small set of realistic test inputs, and a handful of binary yes/no quality checks. It then drives the optimization loop autonomously: one edit per round, one score per round, winners kept and losers reverted, with a running log of what changed and why. WHY IT MATTERS Manual iteration hits diminishing returns fast because humans get tired and stop. An agent doesn't. The bottleneck flips from "can we run this experiment" to "do we even know what question to ask," which means your job becomes curating hypotheses and defining good metrics rather than grinding through trial and error. Any metric you care about that is reasonably cheap to evaluate becomes fair game for automated optimization. WHO IT'S FOR Anyone who maintains prompts or configs they rely on repeatedly: customer-support agents, internal workflow automations, data-extraction pipelines, code-review instructions, or ML training scripts. If you've ever written something, gotten it to "good enough," and moved on, this loop picks up exactly where you stopped.
Use Cases
- Improving a system prompt from good-enough to high-reliability
- Tuning a data-extraction pipeline against test cases
- Optimizing an ML training script or config overnight
Known Limitations
Only as good as your metric: a weak or gameable scoring function will let the loop overfit to the test set. Requires the target to be cheaply and automatically evaluable. Not suited to subjective goals with no measurable signal. Best results come from a small, representative set of test inputs and clear binary checks.
How to Install
mkdir -p ~/.claude/skills && curl -sL https://www.agensi.io/api/install/agent-loop-autoresearch-optimizer -o /tmp/agent-loop-autoresearch-optimizer.zip && unzip -o /tmp/agent-loop-autoresearch-optimizer.zip -d ~/.claude/skills && rm /tmp/agent-loop-autoresearch-optimizer.zipFree skills install directly. Paid skills require purchase - use the download button above after buying.
Reviews
No reviews yet - be the first to share your experience.
Only users who have downloaded or purchased this skill can leave a review.
Early access skill
Be the first to review this skill.
Only users who have downloaded or purchased this skill can leave a review.
Security Scanned
Passed automated security review
Permissions
File Scopes
Shell access is used only to run the user-defined test/eval command each round. Read and write access is scoped to the target artifact being optimized and the experiment log.
Model-agnostic. Works with any SKILL.md-compatible agent (Claude Code, Codex CLI, Cursor, Gemini CLI). Requires a target artifact, 2-3 test inputs, and an automatable scoring function or set of binary checks.
Creator
PubsProToolkit builds adversarial "gate" skills for AI agents — they catch problems before your output ships, instead of just generating more. From code, security, and infrastructure to content, hiring, contracts, and finance. Built by a CMPP-certified, PhD medical writer who brings regulated-industry rigor to every domain.
Frequently Asked Questions
Learn More About AI Agent Skills
More Premium Skills
endless-loop
Autonomous research and task loop that builds on previous findings to solve complex objectives while you sleep.
designing-hybrid-context-layers
Architects the right retrieval strategy for every query — teaching your agent when to use RAG, a knowledge graph, or a temporal index instead of defaulting to vector search for everything.
consumer-motivation-analyzer
Go beyond surface-level feedback to uncover the psychological drivers and hidden motivations behind buyer behavior.
ai-automation-qa-pack
Professional QA & UAT documentation generator for AI automation agencies and complex agent deployments.