Verdict: [RETRIEVAL BOTTLENECK] Recall is critically low. Your retriever is missing the relevant docs entirely. Focus on improving your embedding model or chunking strategy before tuning the prompt.

Rag Eval

Name: Rag Eval
Price: 5 USD
Availability: InStock
Author: Agensi

by Ifásola

Diagnose RAG bottlenecks with precision metrics (Recall, MRR, nDCG) to identify retrieval or ranking failures.

Updated Jun 2026

Security scanned

Cursor

· or 25 credits

30-day refund guarantee

Secure checkout via Stripe

⚡ Also available via Agensi MCP - your AI agent can load this skill on demand via MCP. Learn more →

Included in download

Identify if RAG failures are caused by retrieval, ranking, or generation.
Calculate Recall@k and nDCG to benchmark different embedding models.
Ready for Cursor
Instant install

Sample input

Evaluate our latest retriever results in retrieval_results.jsonl and tell me where to focus.

Sample output

Metrics:

Recall@5: 0.45
MRR: 0.32
nDCG@5: 0.38

Verdict: [RETRIEVAL BOTTLENECK] Recall is critically low. Your retriever is missing the relevant docs entirely. Focus on improving your embedding model or chunking strategy before tuning the prompt.

Security scanned

About This Skill

Diagnostic Tools for RAG Performance

Pinpointing why a Retrieval-Augmented Generation (RAG) system is failing can be a guessing game. Is the embedding model weak? Is the chunking strategy off? Or is the LLM simply hallucinating despite having the right context? This skill eliminates the guesswork by providing a standardized evaluation framework for your retrieval pipeline.

Data-Driven Insights

By comparing your retriever's output against a labeled ground-truth set, this tool calculates industry-standard metrics including Recall@k, Precision@k, Hit-Rate, Mean Reciprocal Rank (MRR), and Normalized Discounted Cumulative Gain (nDCG). It goes beyond raw numbers to provide a technical verdict on where your bottleneck lies.

What it helps you solve

Low Recall: Identifies when your embeddings, chunking strategy, or indexing are failing to surface relevant documents.
Ranking Issues: Detects when relevant documents are being retrieved but ranked too low for the LLM's context window.
Generation Bottlenecks: Confirms when retrieval is healthy, indicating that your issues stem from the prompt or the LLM's reasoning capabilities.

This developer-centric tool requires zero heavy dependencies, running on the Python standard library for easy integration into CI/CD pipelines or local development workflows.

Use Cases

Identify if RAG failures are caused by retrieval, ranking, or generation.
Calculate Recall@k and nDCG to benchmark different embedding models.
Automate regression testing for vector database index updates.
Generate data-driven verdicts to guide chunking and metadata strategy.

How to Install

mkdir -p ~/.claude/skills && curl -sL https://www.agensi.io/api/install/rag-eval -o /tmp/rag-eval.zip && unzip -o /tmp/rag-eval.zip -d ~/.claude/skills && rm /tmp/rag-eval.zip

Free skills install directly. Paid skills require purchase - use the download button above after buying.

Reviews

No reviews yet - be the first to share your experience.

Only users who have downloaded or purchased this skill can leave a review.

Early access skill

Security scanned

Built by Ifásola

Requires Python 3.8+. No external dependencies (standard …

Be the first to review this skill.

Only users who have downloaded or purchased this skill can leave a review.

Security Scanned

Passed automated security review

Permissions

No special permissions declared or detected

Creator

Ifásola

Frequently Asked Questions

Learn More About AI Agent Skills

More Premium Skills

designing-hybrid-context-layers

Architects the right retrieval strategy for every query — teaching your agent when to use RAG, a knowledge graph, or a temporal index instead of defaulting to vector search for everything.

$1016 installs

synthesizing-institutional-knowledge

Builds the organizational memory schema your AI agent needs to answer why — capturing decision provenance, causal chains, and event context that embedding-based retrieval permanently discards.

$105 installs

diagnosing-rag-failure-modes

RAG fails quietly. It retrieves documents, returns confident-looking answers, and misses the question entirely — because the question required connecting facts across documents, reasoning about sequence, or tracing causation. This skill gives you a five-question diagnostic checklist that classifies any failing query as either RAG-safe or structurally RAG-incompatible, then maps it to the specific failure pattern and the architectural fix that resolves it.

$105 installs

ai-automation-qa-pack

Professional QA & UAT documentation generator for AI automation agencies and complex agent deployments.

$510 installs

Rag Eval

Included in download

Rag Eval

Included in download

About This Skill

Diagnostic Tools for RAG Performance

Data-Driven Insights

What it helps you solve

Use Cases

How to Install

How to Install

Reviews

Permissions

Tags

Creator

Frequently Asked Questions

How does rag-eval help me improve my RAG system?

Which AI agents or frameworks is this skill compatible with?

What kind of data do I need to provide for the evaluation to work?

What is included in the purchase of this skill?

Can I integrate this skill into my CI/CD pipeline for automated testing?

Learn More About AI Agent Skills

More Premium Skills

designing-hybrid-context-layers

synthesizing-institutional-knowledge

diagnosing-rag-failure-modes

ai-automation-qa-pack