
Evidence-Grading Framework — Rank Source Quality Before Your Agent Writes
A reusable rubric that grades every source by type, recency, authority, independence, and corroboration, then ranks them and resolves conflicts by evidence weight.
- Rank retrieved documents in a RAG pipeline before generation
- Triage a research reading list into rely / corroborate / avoid
- Resolve contradictory sources by evidence weight, not recency
$18
· or 90 creditsSecure checkout via Stripe
Included in download
- Rank retrieved documents in a RAG pipeline before generation
- Triage a research reading list into rely / corroborate / avoid
- file_read automation included
- Ready for Codex CLI
See it in action
A real example of what this skill takes in and produces.
Sample input
Grade these sources for the question: "What is the current API rate limit?"
- docs.vendor.com/api-reference - official API reference, updated this quarter: "100 requests/minute per key."
- Vendor engineering blog, 6 months old: explains the 100 req/min limit.
- Third-party tutorial blog, 2022, aggregator site: "the API allows 60 requests/minute."
Sample output
SOURCE LEDGER Question: What is the current API rate limit?
[A] Official API reference (updated this quarter) Primary, current, authoritative, independent, corroborated -> RELY ON [B] Vendor engineering blog (6 months old) Secondary, slightly dated, non-independent -> CORROBORATE (matches A) [C] Third-party tutorial citing an older limit (2022 aggregator) Tertiary, stale, uncorroborated -> AVOID (superseded by A)
CONFLICT: C states 60 req/min; A states 100 req/min. RESOLUTION: A wins (primary, current). No open conflicts. RECOMMENDATION: Base the answer on A; cite the changelog for corroboration.
A reusable rubric that grades every source by type, recency, authority, independence, and corroboration, then ranks them and resolves conflicts by evidence weight.
$18
· or 90 creditsSecure checkout via Stripe
Also available in a bundle
Included in download
- Rank retrieved documents in a RAG pipeline before generation
- Triage a research reading list into rely / corroborate / avoid
- file_read automation included
- Ready for Codex CLI
- Instant install
See it in action
A real example of what this skill takes in and produces.
Sample input
Grade these sources for the question: "What is the current API rate limit?"
- docs.vendor.com/api-reference - official API reference, updated this quarter: "100 requests/minute per key."
- Vendor engineering blog, 6 months old: explains the 100 req/min limit.
- Third-party tutorial blog, 2022, aggregator site: "the API allows 60 requests/minute."
Sample output
SOURCE LEDGER Question: What is the current API rate limit?
[A] Official API reference (updated this quarter) Primary, current, authoritative, independent, corroborated -> RELY ON [B] Vendor engineering blog (6 months old) Secondary, slightly dated, non-independent -> CORROBORATE (matches A) [C] Third-party tutorial citing an older limit (2022 aggregator) Tertiary, stale, uncorroborated -> AVOID (superseded by A)
CONFLICT: C states 60 req/min; A states 100 req/min. RESOLUTION: A wins (primary, current). No open conflicts. RECOMMENDATION: Base the answer on A; cite the changelog for corroboration.
About This Skill
## What it does Most agents treat every source as equally trustworthy — a dated forum post and a primary specification get the same weight as long as both mention the keyword. Evidence-Grading Framework gives your agent a structured rubric to rank sources by quality BEFORE it writes a single sentence, so the strongest evidence drives the answer and weak sources are demoted or set aside. It is domain-agnostic and built to sit in front of any RAG pipeline or research agent. Feed it retrieved documents, search results, or a reading list, and it returns a graded source ledger plus a clear recommendation for what to actually rely on. ## Evidence-type tiers - PRIMARY — original, first-hand material: the actual data, spec, filing, official record, or direct statement from the entity in question. - SECONDARY — reputable analysis or reporting that interprets primaries and cites them. - TERTIARY — aggregated summaries (encyclopedic entries, roundups, listicles) that compile other people's work. - UNRANKED — undated, anonymous, or self-referential content with no traceable basis. ## The five scoring dimensions Each source is scored on: TYPE (its tier above), RECENCY (how current it is relative to how fast the topic changes), AUTHORITY (credibility for this specific claim — expertise is domain-specific, not transferable), INDEPENDENCE (whether the source has a stake in the conclusion; vendor and marketing pages are flagged), and CORROBORATION (how many independent sources agree — one source repeated across ten sites is still one source). ## Overall grade (A–D) - A — primary or strongly corroborated, current, authoritative, independent. Rely on it. - B — solid secondary or well-supported source with a minor weakness. Usable; corroborate key numbers. - C — tertiary, dated, or non-independent. Use only as a lead; verify before relying. - D — unranked, conflicted, or contradicted by stronger evidence. Do not rely on. ## Conflict resolution When two sources disagree, the agent does NOT default to the most recent or the most confident. It resolves by evidence weight: the higher-graded source wins, ties go to the better-corroborated claim, and unresolved conflicts are reported as open rather than papered over. A confident D never overrides a careful A. ## Sample output SOURCE LEDGER Question: What is the current API rate limit? [A] Official API reference (updated this quarter) Primary · current · authoritative · independent · corroborated -> RELY ON [B] Vendor engineering blog (6 months old) Secondary · slightly dated · non-independent -> CORROBORATE (matches A) [C] Third-party tutorial citing an older limit (2022 aggregator) Tertiary · stale · uncorroborated -> AVOID (superseded by A) CONFLICT: C states 60 req/min; A states 100 req/min. RESOLUTION: A wins (primary, current). No open conflicts. RECOMMENDATION: Base the answer on A; cite the changelog for corroboration. ## Why use this skill RAG and research agents usually fail not because they cannot find sources, but because they cannot tell a strong source from a weak one — and let the loudest or most recent text win. This framework makes source quality explicit and auditable, so the agent's output inherits the credibility of its best evidence rather than the average of everything it retrieved. The rubric is adapted from hierarchy-of-evidence practice in regulated scientific and technical research, where ranking sources before drawing conclusions is standard discipline — and almost entirely absent from general-purpose AI tooling. ## Use cases - Rank retrieved documents in a RAG pipeline before generation, so weak sources are down-weighted automatically. - Triage a research reading list into rely / corroborate / avoid tiers. - Resolve contradictory sources with a defensible, evidence-weighted rule instead of recency bias. - Produce a graded source ledger to attach to any research deliverable for transparency. - Pair with a claim-checking step: grade the sources first, then verify claims against the A and B sources only. ## Known limitations - Grades reflect source quality and provenance, not absolute truth. A well-produced primary source can still be wrong; the framework tells you how much weight a source has earned, not that its content is correct. - Authority and recency judgments depend on correctly identifying the topic and how fast it changes; supply the research question for best results. - It evaluates the sources provided to it and does not search for better ones unless your agent separately provides that capability. - Independence can only be assessed from what a source discloses; hidden conflicts of interest may not be detectable.
Use Cases
- Rank retrieved documents in a RAG pipeline before generation
- Triage a research reading list into rely / corroborate / avoid
- Resolve contradictory sources by evidence weight, not recency
Known Limitations
Grades reflect source quality and provenance, not absolute truth: a well-produced primary source can still be wrong, so the grade tells you how much weight a source has earned, not that its content is correct. Authority and recency judgments depend on correctly identifying the topic and how fast it changes, so supplying the research question improves results. The skill evaluates only the sources you provide and does not search for better ones unless your agent separately provides that capability. Independence can be judged only from what a source discloses, so hidden conflicts of interest may not be detectable.
How to Install
mkdir -p ~/.claude/skills && curl -sL https://www.agensi.io/api/install/evidence-grading-framework-rank-source-quality-before-your-agent-writes -o /tmp/evidence-grading-framework-rank-source-quality-before-your-agent-writes.zip && unzip -o /tmp/evidence-grading-framework-rank-source-quality-before-your-agent-writes.zip -d ~/.claude/skills && rm /tmp/evidence-grading-framework-rank-source-quality-before-your-agent-writes.zipFree skills install directly. Paid skills require purchase - use the download button above after buying.
Reviews
No reviews yet - be the first to share your experience.
Only users who have downloaded or purchased this skill can leave a review.
Early access skill
Be the first to review this skill.
Only users who have downloaded or purchased this skill can leave a review.
Security Scanned
Passed automated security review
Permissions
File Scopes
Read-only. The skill reads the sources and any provided documents to grade them. It does not write, execute, or access the network.
Works with any SKILL.md-compatible agent (Claude Code, Codex CLI, Cursor, VS Code Copilot, Gemini CLI). No runtime dependencies. Designed to run as a pre-generation step in RAG and research pipelines; strongest when you also supply the research question.
Creator
PubsProToolkit builds AI agent skills that bring regulated-industry rigor to written output. Created by a CMPP-certified medical writer with a PhD and 10+ years in pharma — covering clinical and scientific publishing, plus evidence-grounded QC for any agent.
Also available in a bundle
Frequently Asked Questions
Learn More About AI Agent Skills
More Premium Skills
designing-hybrid-context-layers
Architects the right retrieval strategy for every query — teaching your agent when to use RAG, a knowledge graph, or a temporal index instead of defaulting to vector search for everything.
diagnosing-rag-failure-modes
RAG fails quietly. It retrieves documents, returns confident-looking answers, and misses the question entirely — because the question required connecting facts across documents, reasoning about sequence, or tracing causation. This skill gives you a five-question diagnostic checklist that classifies any failing query as either RAG-safe or structurally RAG-incompatible, then maps it to the specific failure pattern and the architectural fix that resolves it.
research-to-decision-pro-skill
Turn messy research into rigorous, evidence-backed decision memos with specialized modes for technical and business leads.
synthesizing-institutional-knowledge
Builds the organizational memory schema your AI agent needs to answer why — capturing decision provenance, causal chains, and event context that embedding-based retrieval permanently discards.