THE AGENSI STORE

Browse The Skill Store

11 skills found

evaluating-ai-harness-dimensions

$10

Evaluates AI coding agent platforms across five structural dimensions that determine real-world performance independently of model quality, so teams select on architectural fit rather than benchmark scores.

agent-architectureai-agentsai-coding-agents+12

prompt-engineer-pro

by Roy Yuen

Professional prompt engineering, audit, and evaluation system for production-grade AI agents and workflows.

ai-agentsgovernancellm-ops+2

production-agent-architect

by Roy Yuen

Architect, scaffold, and harden production-grade AI agents with battle-tested patterns and systematic evaluation.

agentic-workflowsai-agentslangchain+3

agent-eval-coverage-audit

by Roy Yuen

Audit your AI agent's evaluation coverage to identify missing release gates and production risks.

ai-testingauditcompliance+2

Optimization-Loop

by Martin Gunderman

$19

Autonomous loop that iteratively modifies, evaluates, and selects the best version of any text resource — skills, prompts, or campaigns — using a modify-measure-keep/discard cycle.

optimizationiterationevaluation+4

AI Eval & Test-Suite Quality Gate — Catch LLM Evals That Lie Before You Ship

by PubsProToolkit

$14

An adversarial gate that audits an AI eval or test suite — LLM-judge rubrics, datasets, regression tests, metrics — for gameable criteria, data leakage, missing edge cases, and non-determinism, then returns one PASS/REVISE/FAIL verdict.

ai-evaluationllm-evaltest-quality+2

rag-eval

by Ifásola

Diagnose RAG bottlenecks with precision metrics (Recall, MRR, nDCG) to identify retrieval or ranking failures.

ragllm-opsevaluation+2

Psychometrician — Scale Development, Validation & Measurement

by nSight Analytics

$19

Turns your agent into a psychometrician that builds, validates, and troubleshoots measures — reliability, validity, factor analysis, IRT, and measurement invariance.

psychometricsscale-validationfactor-analysis+9

benchmarking-ai-agents-beyond-models

by loreto

Free

Published AI benchmarks measure brains in jars. They test models in isolation or within a single reference harness — and then attribute all performance to the model. This skill teaches you to decompose agent performance into its two actual components: model capability and harness multiplier. The result is evaluations that predict real-world behavior instead of benchmark theater.

14 5.0

agent-evaluationai-agentsai-benchmarking+10

Skill Health Scanner

by Markus Isaksson

Free

Instantly diagnose any skill or prompt and get a clear, prioritized report on what’s wrong and how to fix it — across any agent.

claudecross-agentcursor+9

KOL Marketing Engine

by Joker

Free

6-tier KOL classification, 6-platform mapping, 5-dimension screening, ROI evaluation, risk management for influencer campaigns.

kolinfluencermarketing+2