Which AI agents support diagnosing-rag-failure-modes?

This skill uses the open SKILL.md standard and is compatible with Claude Code, Codex CLI, Cursor, Gemini CLI, and other agents that support SKILL.md files. Check the compatibility info on this page for details.

How do I install diagnosing-rag-failure-modes?

Download the skill from this page, then unzip it into your agent's skills directory. For Claude Code, run: unzip diagnosing-rag-failure-modes.zip -d ~/.claude/skills/ — Your agent will detect the skill automatically on the next session.

diagnosing-rag-failure-modes

RAG fails quietly. It retrieves documents, returns confident-looking answers, and misses the question entirely — because the question required connecting facts across documents, reasoning about sequence, or tracing causation. This skill gives you a five-question diagnostic checklist that classifies any failing query as either RAG-safe or structurally RAG-incompatible, then maps it to the specific failure pattern and the architectural fix that resolves it.

by Loreto

1 installs

25 views

About This Skill

Problems It Solves

Silent retrieval failure — RAG pipelines return plausible-sounding results on multi-hop and causal queries, making failures hard to detect. Teams iterate on embedding quality and chunking strategy for weeks before realizing the query type is the problem, not the implementation.
Wrong fix applied — Most RAG debugging focuses on embedding models, chunk size, and reranking. These are the right levers for factual lookup failures. They do nothing for relational and temporal failures, where the architecture itself is mismatched to the query.
Query type blindness — No standard vocabulary exists for distinguishing "what is X" from "how did X come to be" at the pipeline level. Without this distinction, every query gets routed to the same retrieval system regardless of structural fit.
Scale degradation — RAG degrades on large corpora not because the embeddings get worse, but because the signal-to-noise ratio collapses. Teams add reranking layers and see marginal improvement, missing that tiered retrieval is the actual fix.

What You Get

The two-class query taxonomy — A clear, actionable split between Class A (factual lookup, RAG-safe) and Class B (relational/temporal, RAG danger zone), with concrete examples of each so classification is fast and unambiguous.
Five-question diagnostic checklist — Run any failing query through five yes/no checks (multi-document join required? order matters? causation chain? time span? why, not just what?) to score it as Class A, borderline, or Class B in under two minutes.
Four named failure patterns — Multi-hop relational failure, temporal sequencing failure, organizational context failure, and scale failure — each with a symptom description, a worked example, and a specific architectural fix.
Failure Classification Report template — A structured output artifact (query, class, failure patterns, root cause paragraph, recommended fix, references) that communicates a diagnosis clearly to engineers, architects, and non-technical stakeholders.
Architectural fix references — Each failure pattern maps directly to a companion skill (designing-hybrid-context-layers, temporal-reasoning-sleuth, synthesizing-institutional-knowledge) so diagnosis connects immediately to remediation.

Who Should Use This

Engineers and AI architects whose RAG pipeline is returning poor results and need to determine whether the problem is implementation quality (fixable with tuning) or architectural mismatch (requires a different retrieval approach).
Teams building agents over organizational knowledge bases — ADRs, incident reports, policy documents, vendor contracts — where some queries will always be relational or temporal in nature.
Technical leads evaluating whether to add a knowledge graph, timeline index, or hybrid retrieval layer and needing a principled basis for the recommendation rather than intuition.

Use Cases

RAG pipeline debugging: An agent over internal documentation fails on "Why did we deprecate the v1 API?" — a query that requires linking the deprecation notice, the downstream services affected, and the architectural rationale from a decision record written two years earlier. The diagnostic checklist scores it as Class B (3 checks: multi-document join, causal chain, temporal span). Root cause: structural RAG mismatch. Fix: knowledge graph traversal.
Architecture investment justification: A team wants to add a knowledge graph but needs to demonstrate to engineering leadership why the current vector store cannot be tuned to handle the failing queries. The failure classification report provides a structured argument with root cause analysis and specific pattern attribution.
Onboarding agent quality review: A new onboarding assistant answers "What is our PTO policy?" correctly but fails on "Why is our engineering team structured the way it is?" The diagnostic separates these as Class A and Class B respectively — and identifies that the second query requires organizational context provenance that was never ingested, not better embeddings.
Vendor evaluation: A team is evaluating RAG vendors and receives demo results on their sample queries. Running the diagnostic checklist against the sample set reveals that all demo queries were Class A. Their actual production queries are 60% Class B. The vendor's system is being benchmarked on a task distribution it will never face in production.

How to Install

unzip diagnosing-rag-failure-modes.zip -d ~/.claude/skills/

Free

One-time purchase • Own forever

Security Scanned

Passed automated security review

8/8 checks passed

Creator

Loreto

Over 20 years of experience in data exploration and digital signal processing working across a variety of sectors including fintech, aerospace, and defense. Expertise in Risk Analysis, Engine Health Monitoring and predictive maintenance efforts for one of the world’s leading jet engine manufacturers developing machine learning models and helping organizations achieve real impact from their analytics initiatives. Passionate about Agentic workflows, the Enterprise Context Layer, and Information Synthesis. Specializing in Enterprise AI.

Frequently Asked Questions

Learn More About AI Agent Skills

Similar Skills

designing-hybrid-context-layers

Architects the right retrieval strategy for every query — teaching your agent when to use RAG, a knowledge graph, or a temporal index instead of defaulting to vector search for everything.

Free2 installs

env-doctor

Diagnoses why your project will not start. Checks runtime versions, dependencies, environment variables, databases, ports, and build artifacts systematically.

Free15 installs

temporal-reasoning-sleuth

Give AI agents the ability to trace decision chains, reconstruct causal sequences, and reason over complex event timelines spanning months or years.

Free6 installs

benchmarking-ai-agents-beyond-models

Published AI benchmarks measure brains in jars. They test models in isolation or within a single reference harness — and then attribute all performance to the model. This skill teaches you to decompose agent performance into its two actual components: model capability and harness multiplier. The result is evaluations that predict real-world behavior instead of benchmark theater.

Free1 installs