2
    Evidence-Grading Framework — Rank Source Quality Before Your Agent Writes

    Evidence-Grading Framework — Rank Source Quality Before Your Agent Writes

    by PubsProToolkit

    A reusable rubric that grades every source by type, recency, authority, independence, and corroboration, then ranks them and resolves conflicts by evidence weight.

    Updated Jun 2026
    Security scanned
    Codex CLI

    $18

    · or 90 credits

    30-day refund guarantee

    Secure checkout via Stripe

    Also available in a bundle

    Included in download

    • Rank retrieved documents in a RAG pipeline before generation
    • Triage a research reading list into rely / corroborate / avoid
    • file_read automation included
    • Ready for Codex CLI
    • Instant install

    See it in action

    A real example of what this skill takes in and produces.

    Sample input

    Grade these sources for the question: "What is the current API rate limit?"

    1. docs.vendor.com/api-reference - official API reference, updated this quarter: "100 requests/minute per key."
    2. Vendor engineering blog, 6 months old: explains the 100 req/min limit.
    3. Third-party tutorial blog, 2022, aggregator site: "the API allows 60 requests/minute."

    Sample output

    SOURCE LEDGER Question: What is the current API rate limit?

    [A] Official API reference (updated this quarter) Primary, current, authoritative, independent, corroborated -> RELY ON [B] Vendor engineering blog (6 months old) Secondary, slightly dated, non-independent -> CORROBORATE (matches A) [C] Third-party tutorial citing an older limit (2022 aggregator) Tertiary, stale, uncorroborated -> AVOID (superseded by A)

    CONFLICT: C states 60 req/min; A states 100 req/min. RESOLUTION: A wins (primary, current). No open conflicts. RECOMMENDATION: Base the answer on A; cite the changelog for corroboration.

    About This Skill

    ## What it does Most agents treat every source as equally trustworthy — a dated forum post and a primary specification get the same weight as long as both mention the keyword. Evidence-Grading Framework gives your agent a structured rubric to rank sources by quality BEFORE it writes a single sentence, so the strongest evidence drives the answer and weak sources are demoted or set aside. It is domain-agnostic and built to sit in front of any RAG pipeline or research agent. Feed it retrieved documents, search results, or a reading list, and it returns a graded source ledger plus a clear recommendation for what to actually rely on. ## Evidence-type tiers - PRIMARY — original, first-hand material: the actual data, spec, filing, official record, or direct statement from the entity in question. - SECONDARY — reputable analysis or reporting that interprets primaries and cites them. - TERTIARY — aggregated summaries (encyclopedic entries, roundups, listicles) that compile other people's work. - UNRANKED — undated, anonymous, or self-referential content with no traceable basis. ## The five scoring dimensions Each source is scored on: TYPE (its tier above), RECENCY (how current it is relative to how fast the topic changes), AUTHORITY (credibility for this specific claim — expertise is domain-specific, not transferable), INDEPENDENCE (whether the source has a stake in the conclusion; vendor and marketing pages are flagged), and CORROBORATION (how many independent sources agree — one source repeated across ten sites is still one source). ## Overall grade (A–D) - A — primary or strongly corroborated, current, authoritative, independent. Rely on it. - B — solid secondary or well-supported source with a minor weakness. Usable; corroborate key numbers. - C — tertiary, dated, or non-independent. Use only as a lead; verify before relying. - D — unranked, conflicted, or contradicted by stronger evidence. Do not rely on. ## Conflict resolution When two sources disagree, the agent does NOT default to the most recent or the most confident. It resolves by evidence weight: the higher-graded source wins, ties go to the better-corroborated claim, and unresolved conflicts are reported as open rather than papered over. A confident D never overrides a careful A. ## Sample output SOURCE LEDGER Question: What is the current API rate limit? [A] Official API reference (updated this quarter) Primary · current · authoritative · independent · corroborated -> RELY ON [B] Vendor engineering blog (6 months old) Secondary · slightly dated · non-independent -> CORROBORATE (matches A) [C] Third-party tutorial citing an older limit (2022 aggregator) Tertiary · stale · uncorroborated -> AVOID (superseded by A) CONFLICT: C states 60 req/min; A states 100 req/min. RESOLUTION: A wins (primary, current). No open conflicts. RECOMMENDATION: Base the answer on A; cite the changelog for corroboration. ## Why use this skill RAG and research agents usually fail not because they cannot find sources, but because they cannot tell a strong source from a weak one — and let the loudest or most recent text win. This framework makes source quality explicit and auditable, so the agent's output inherits the credibility of its best evidence rather than the average of everything it retrieved. The rubric is adapted from hierarchy-of-evidence practice in regulated scientific and technical research, where ranking sources before drawing conclusions is standard discipline — and almost entirely absent from general-purpose AI tooling. ## Use cases - Rank retrieved documents in a RAG pipeline before generation, so weak sources are down-weighted automatically. - Triage a research reading list into rely / corroborate / avoid tiers. - Resolve contradictory sources with a defensible, evidence-weighted rule instead of recency bias. - Produce a graded source ledger to attach to any research deliverable for transparency. - Pair with a claim-checking step: grade the sources first, then verify claims against the A and B sources only. ## Known limitations - Grades reflect source quality and provenance, not absolute truth. A well-produced primary source can still be wrong; the framework tells you how much weight a source has earned, not that its content is correct. - Authority and recency judgments depend on correctly identifying the topic and how fast it changes; supply the research question for best results. - It evaluates the sources provided to it and does not search for better ones unless your agent separately provides that capability. - Independence can only be assessed from what a source discloses; hidden conflicts of interest may not be detectable.

    Use Cases

    • Rank retrieved documents in a RAG pipeline before generation
    • Triage a research reading list into rely / corroborate / avoid
    • Resolve contradictory sources by evidence weight, not recency

    Reviews

    No reviews yet - be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    Read Files

    File Scopes

    evidence-grading-framework/**

    Read-only. The skill reads the sources and any provided documents to grade them. It does not write, execute, or access the network.

    Works with any SKILL.md-compatible agent (Claude Code, Codex CLI, Cursor, VS Code Copilot, Gemini CLI). No runtime dependencies. Designed to run as a pre-generation step in RAG and research pipelines; strongest when you also supply the research question.

    Creator

    PubsProToolkit builds AI agent skills that bring regulated-industry rigor to written output. Created by a CMPP-certified medical writer with a PhD and 10+ years in pharma — covering clinical and scientific publishing, plus evidence-grounded QC for any agent.

    Frequently Asked Questions

    More Premium Skills

    $18