July Skill Contest: Build the best skills, drive buyers, win up to $250. Enter now

    Browse The Skill Store

    1 skill found

    AI Feature Eval Writer — Golden Datasets, Rubrics, and LLM-as-Judge Prompts That Actually Catch Regressions

    by PubsProToolkit

    $14

    Design and write the eval suite for your LLM-powered feature — the metrics that match your failure modes, a golden dataset plan with starter cases, anchored rubrics, LLM-as-judge prompts with the known bias mitigations, and pass/fail gates wired for CI.

    1
    evalsllm-evaluationllm-as-judge+6