THE AGENSI STORE

Browse The Skill Store

2 skills found

AI Feature Eval Writer — Golden Datasets, Rubrics, and LLM-as-Judge Prompts That Actually Catch Regressions

$14

Design and write the eval suite for your LLM-powered feature — the metrics that match your failure modes, a golden dataset plan with starter cases, anchored rubrics, LLM-as-judge prompts with the known bias mitigations, and pass/fail gates wired for CI.

evalsllm-evaluationllm-as-judge+6

AI Eval & Test-Suite Quality Gate — Catch LLM Evals That Lie Before You Ship

by PubsProToolkit

$14

An adversarial gate that audits an AI eval or test suite — LLM-judge rubrics, datasets, regression tests, metrics — for gameable criteria, data leakage, missing edge cases, and non-determinism, then returns one PASS/REVISE/FAIL verdict.

ai-evaluationllm-evaltest-quality+2