July Creator Contest: sell skills, drive buyers, win $250. Enter now

    Browse The Skill Store

    11 skills found

    evaluating-ai-harness-dimensions

    by loreto

    $10

    Evaluates AI coding agent platforms across five structural dimensions that determine real-world performance independently of model quality, so teams select on architectural fit rather than benchmark scores.

    3
    agent-architectureai-agentsai-coding-agents+12

    prompt-engineer-pro

    by Roy Yuen

    $8

    Professional prompt engineering, audit, and evaluation system for production-grade AI agents and workflows.

    3
    ai-agentsgovernancellm-ops+2

    production-agent-architect

    by Roy Yuen

    $6

    Architect, scaffold, and harden production-grade AI agents with battle-tested patterns and systematic evaluation.

    2
    2
    agentic-workflowsai-agentslangchain+3

    agent-eval-coverage-audit

    by Roy Yuen

    $5

    Audit your AI agent's evaluation coverage to identify missing release gates and production risks.

    2
    ai-testingauditcompliance+2

    Optimization-Loop

    by Martin Gunderman

    $19

    Autonomous loop that iteratively modifies, evaluates, and selects the best version of any text resource — skills, prompts, or campaigns — using a modify-measure-keep/discard cycle.

    1
    1
    optimizationiterationevaluation+4

    AI Eval & Test-Suite Quality Gate — Catch LLM Evals That Lie Before You Ship

    by PubsProToolkit

    $14

    An adversarial gate that audits an AI eval or test suite — LLM-judge rubrics, datasets, regression tests, metrics — for gameable criteria, data leakage, missing edge cases, and non-determinism, then returns one PASS/REVISE/FAIL verdict.

    2
    1
    ai-evaluationllm-evaltest-quality+2

    rag-eval

    by Ifásola

    $5

    Diagnose RAG bottlenecks with precision metrics (Recall, MRR, nDCG) to identify retrieval or ranking failures.

    2
    ragllm-opsevaluation+2

    Psychometrician — Scale Development, Validation & Measurement

    by nSight Analytics

    $19

    Turns your agent into a psychometrician that builds, validates, and troubleshoots measures — reliability, validity, factor analysis, IRT, and measurement invariance.

    1
    psychometricsscale-validationfactor-analysis+9

    benchmarking-ai-agents-beyond-models

    by loreto

    Free

    Published AI benchmarks measure brains in jars. They test models in isolation or within a single reference harness — and then attribute all performance to the model. This skill teaches you to decompose agent performance into its two actual components: model capability and harness multiplier. The result is evaluations that predict real-world behavior instead of benchmark theater.

    1
    14 5.0
    agent-evaluationai-agentsai-benchmarking+10

    Skill Health Scanner

    by Markus Isaksson

    Free

    Instantly diagnose any skill or prompt and get a clear, prioritized report on what’s wrong and how to fix it — across any agent.

    2
    12
    claudecross-agentcursor+9

    KOL Marketing Engine

    by Joker

    Free

    6-tier KOL classification, 6-platform mapping, 5-dimension screening, ROI evaluation, risk management for influencer campaigns.

    1
    1
    kolinfluencermarketing+2