July Skill Contest: Build the best skills, drive buyers, win up to $250. Enter now

    Browse The Skill Store

    2 skills found

    AI Feature Eval Writer — Golden Datasets, Rubrics, and LLM-as-Judge Prompts That Actually Catch Regressions

    by PubsProToolkit

    $14

    Design and write the eval suite for your LLM-powered feature — the metrics that match your failure modes, a golden dataset plan with starter cases, anchored rubrics, LLM-as-judge prompts with the known bias mitigations, and pass/fail gates wired for CI.

    1
    evalsllm-evaluationllm-as-judge+6

    AI Eval & Test-Suite Quality Gate — Catch LLM Evals That Lie Before You Ship

    by PubsProToolkit

    $14

    An adversarial gate that audits an AI eval or test suite — LLM-judge rubrics, datasets, regression tests, metrics — for gameable criteria, data leakage, missing edge cases, and non-determinism, then returns one PASS/REVISE/FAIL verdict.

    2
    1
    ai-evaluationllm-evaltest-quality+2