1

    Cloud Resilience and Dependency Auditor — Find Your Single Points of Failure Before the Next Outage Does

    Map your real cloud dependency tree — data plane and control plane — find the single points of failure ranked by blast radius, catch the classic traps like monitoring that dies with the region it watches, and get a prioritized resilience backlog. The audit every outage post-mortem says to do first.

    Updated Jul 2026
    Security scanned
    Cursor

    $14

    · or 70 credits

    30-day refund guarantee

    Secure checkout via Stripe

    Included in download

    • Map data plane and control plane dependencies to find hidden couplings.
    • Rank infrastructure failure points by blast radius and recovery impact.
    • file_read, file_write automation included
    • Ready for Cursor
    • Instant install

    Media gallery

    See it in action

    You say

    Audit our resilience. We run on AWS with us-east-1 as primary and a warm standby in us-west-2. Core services: EKS for compute, RDS Postgres (primary in us-east-1), Secrets Manager in us-east-1, Auth0 for identity, Cloudflare for DNS and CDN, SQS for queues, and Datadog for observability. CI/CD is GitHub Actions. Map our real dependency tree, find the single points of failure ranked by blast radius, and give me a prioritized resilience backlog.

    Your agent does

    Dependency map (excerpt): Control plane — Auth0 (global identity), AWS Secrets Manager (us-east-1), Cloudflare DNS (global). Data plane — EKS, RDS Postgres, SQS. Shared-fate risk: Datadog and your recovery runbooks both live in us-east-1, the region most likely to be failing.

    Ranked single-point-of-failure register:

    1. Secrets Manager (us-east-1) — Blast radius: HIGH. Your us-west-2 standby reads secrets from us-east-1 to rotate credentials and restart pods, so it cannot recover while us-east-1 is down. Recover-while-down? No.
    2. Auth0 (global) — Blast radius: HIGH. Universal gatekeeper; if identity is unavailable, no service authenticates, including your admin tooling. Recover-while-down? Partial.
    3. Observability in us-east-1 — Blast radius: MEDIUM. You go blind during the exact outage you need to see. Recover-while-down? No.

    Resilience backlog (blast radius x likelihood):

    • P1: Enable cross-region secret replication to us-west-2. Fix effort: ~M.
    • P1: Add a break-glass auth path independent of Auth0 for operators. Fix effort: ~M.
    • P2: Move observability/alerting to a region or provider that does not share fate with primary. Fix effort: ~L.

    Accepted-risk register: Cloudflare DNS treated as accepted global dependency this quarter; revisit with secondary DNS provider next planning cycle.

    About This Skill

    Every outage post-mortem recommends the same first step, and almost nobody has done it: map every critical service, API, and route your uptime depends on — not just the data plane, but the control plane. The dependency tree is no longer obvious from the invoice, and "we're multi-region" is not the same as "we're resilient," because what takes you down is usually a shared service — DNS, identity, a control plane — that is itself a single point of failure across all your regions. Cloud Resilience and Dependency Auditor does the audit. Describe your stack — providers and regions, and the services you lean on for DNS, identity, CDN, queues, data, secrets, observability, and CI/CD — and it maps the real dependency tree including the control-plane and hidden shared-service couplings people miss, ranks the single points of failure by blast radius, and checks the classic traps: monitoring and recovery tooling that share fate with the environment that's failing (so you go blind exactly when you need clarity), auth as a universal gatekeeper, the default-region hub, and vendors whose dependency tree you inherit. It returns a dependency map, a ranked single-point-of-failure register noting whether you could even recover while each is down, a resilience backlog prioritized by blast radius times likelihood with the specific fix and rough effort for each, and an accepted-risk register for what to consciously live with. The download includes three reference files: the dependency-inventory worksheet, a single-point-of-failure pattern guide, and a worked sample audit. It audits from what you describe — it doesn't scan your infrastructure or test failover, and a plan you haven't rehearsed is a hypothesis. Works with Claude Code, Cursor, Codex CLI, Gemini CLI, and any SKILL.md agent.

    Use Cases

    • Map data plane and control plane dependencies to find hidden couplings.
    • Rank infrastructure failure points by blast radius and recovery impact.
    • Identify fate-sharing risks in observability and deployment tooling.
    • Develop a prioritized backlog of resilience improvements and failover plans.

    How to install

    Drop the file into your AI tool. Works with Claude, Cursor, ChatGPT, and 20+ more.

    Reviews

    No reviews yet - be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    Read Files
    Write Files

    File Scopes

    references/**

    This skill only reads the stack description and reference files you provide and writes its audit outputs (dependency map, single-point-of-failure register, resilience backlog, and accepted-risk register) back as local files. It does not use a terminal, browser, network access, or environment variables, and it never connects to your cloud accounts or scans live infrastructure. File scope references/** covers the three bundled reference files: dependency-inventory-worksheet.md, spof-pattern-guide.md, and sample-resilience-audit.md.

    Works with any agent that supports the open SKILL.md standard — including Claude Code, Cursor, Codex CLI, and Gemini CLI. No special runtime, cloud credentials, or network access required; the skill only needs to read the stack description and reference files you provide and write its audit outputs as local files.

    Frequently Asked Questions

    $14