1
    Flaky Test Detective for AI Coding Agents

    Flaky Test Detective for AI Coding Agents

    by Shandra Skills

    Diagnoses unreliable tests, identifies root causes, creates stabilization plans, and generates safe AI coding prompts for fixing flaky unit, integration, E2E, and CI tests.

    Updated May 2026
    Security scanned
    One-time purchase
    Compatible with ChatGPT Custom GPTs

    $9.99

    · or 50 credits

    One-time purchase

    30-day refund guarantee

    Secure checkout via Stripe

    Included in download

    • Find and fix async timing issues and missing awaits in Playwright/Cypress.
    • Detect shared state leaks causing order-dependent Jest or pytest failures.
    • browser, env_vars, terminal automation included
    • Ready for Compatible with ChatGPT Custom GPTs
    • Instant install

    Sample Output

    A real example of what this skill produces.

    === FLAKY TEST DETECTIVE REPORT ===

    Original request: Fix this flaky Playwright checkout test. It passes locally but fails in CI during the payment confirmation step.

    Interpreted testing goal: Diagnose and stabilize an intermittent Playwright checkout test failure while preserving the test intent and avoiding superficial fixes.

    Test/framework: Playwright E2E test

    Test intent: Verify that a user can complete checkout and reach payment confirmation successfully.

    Failure pattern: Passes locally but fails in CI during the payment confirmation step.

    Confirmed facts:

    • The test uses Playwright.
    • The test covers checkout.
    • The failure occurs in CI.
    • The failure occurs around payment confirmation.
    • The user does not want a timeout-only or skip-test fix.

    Assumptions:

    • CI may run headless and slower than local.
    • The payment confirmation step may depend on network, backend state, mock setup, or asynchronous UI updates.
    • The test may use shared test data or a shared account.
    • Checkout is a high-risk flow and assertions should not be weakened.

    Unknowns:

    • exact error message
    • test code
    • trace/video availability
    • whether tests run in parallel
    • whether payment provider is mocked
    • whether checkout uses real backend services
    • whether a shared account/cart/order is used

    Likely root cause hypotheses:

    1. Async timing around payment confirmation

    Evidence: CI failures during confirmation often happen when the test asserts UI state before backend/payment state has settled.

    How to confirm: Review Playwright trace, network logs, and screenshot at failure.

    Likely fix: Wait for a deterministic confirmation event, response, URL transition, or UI state instead of using a fixed delay.

    Risk if wrong: The test may continue failing if the real problem is data isolation or mock setup.

    1. Payment/network mock not installed before the action

    Evidence: CI timing may allow payment request to escape or resolve differently.

    How to confirm: Inspect route/mock setup order and network trace.

    Likely fix: Install route mocks before triggering checkout and assert that expected requests occur.

    Risk if wrong: External dependency flake may persist.

    1. Shared checkout state or reused test account

    Evidence: Checkout flows often fail in parallel when users, carts, orders, or payment sessions are reused.

    How to confirm: Check test data creation, worker IDs, account IDs, cart cleanup, and parallel settings.

    Likely fix: Create isolated user/cart/order data per test or worker.

    Risk if wrong: Parallel-only failures may continue.

    1. Brittle selector for confirmation state

    Evidence: Confirmation UI may animate, render progressively, or have hidden duplicate elements.

    How to confirm: Inspect locator and trace DOM snapshot.

    Likely fix: Use stable role/name or test ID locator and wait for visibility/enabled state.

    Risk if wrong: Selector may still target the wrong element.

    Flakiness category: CI-only E2E flake involving async timing, network mocking, shared state, selector stability, or real product race.

    Do not fix by:

    • blindly increasing timeout
    • adding arbitrary waitForTimeout as the main fix
    • skipping the test
    • weakening checkout assertions
    • hiding the issue with retries only
    • changing payment behavior without evidence

    Investigation plan:

    1. Collect CI failure trace, screenshot, video, console logs, and network logs.
    2. Identify the exact failing locator, assertion, or timeout.
    3. Run the test locally in headless mode.
    4. Run the test repeatedly 20 times.
    5. Run with CI-equivalent parallel workers.
    6. Check whether payment mocks are installed before checkout action.
    7. Check whether user/cart/order data is unique per test.
    8. Check whether the test waits for network response, URL transition, or confirmation state.
    9. Check whether the confirmation selector is stable and unique.

    Stabilization plan:

    • preserve the original checkout success intent
    • install mocks before actions
    • use deterministic waits for payment confirmation
    • isolate checkout data per test
    • replace brittle selectors with role/name or stable test IDs
    • capture trace/video/screenshot on failure
    • add cleanup before and after test if needed
    • verify with repeated runs

    AI coding agent prompt: Inspect this flaky Playwright checkout test and diagnose the root cause before changing code. Preserve the test intent: verifying successful checkout through payment confirmation. Review the failure message, CI trace, screenshot, video, console logs, network logs, selectors, mock setup, test data, parallel execution settings, and cleanup. Do not blindly increase timeouts, add arbitrary waitForTimeout calls, skip the test, or weaken assertions. Determine whether the flake is caused by async timing, selector instability, shared user/cart/order state, payment/network mock setup, CI environment differences, or a real product race. Fix the root cause with deterministic waits, isolated data, stable selectors, and proper mock setup. Return root cause, evidence, files inspected, files changed, why the fix is deterministic, verification runs performed or recommended, and remaining risks.

    Verification plan:

    • run the test alone 20 times
    • run the test with related checkout tests
    • run in headless mode
    • run with the same parallel worker settings as CI
    • confirm failure traces remain enabled
    • confirm no arbitrary sleep was added as the main fix
    • confirm checkout assertions still prove the original behavior

    Remaining risks: If traces show that the application sometimes reaches an inconsistent checkout state, the flake may reveal a real product race condition requiring production-code synchronization and regression coverage.

    About This Skill

    Flaky Test Detective helps AI coding agents, developers, QA engineers, CI/CD teams, SaaS builders, and test automation teams investigate unreliable tests that pass and fail inconsistently. It analyzes flaky unit tests, integration tests, E2E tests, Playwright tests, Cypress tests, Selenium tests, Jest tests, Vitest tests, pytest tests, browser tests, API tests, database tests, and CI-only failures. The skill creates root cause hypotheses, failure pattern reports, async timing audits, selector stability reviews, test isolation plans, mock leakage checks, database cleanup reviews, CI environment comparisons, stabilization roadmaps, QA tickets, verification plans, and paste-ready prompts for Cursor, Claude Code, Codex CLI, OpenCode, Replit, and ChatGPT Agents. It is designed to restore trust in test suites by fixing real causes instead of hiding failures with retries, arbitrary sleeps, skipped tests, or weakened assertions.

    📖 Learn more: Best Testing & QA Skills for Claude Code →

    Reviews

    No reviews yet - be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    Browser
    Environment Variables
    Terminal / Shell
    Write Files
    Read Files

    File Scopes

    *.md
    *.txt
    *.json
    *.yaml
    *.yml
    *.log
    README.md
    src/**
    app/**
    server/**
    api/**
    tests/**
    test/**
    spec/**
    cypress/**
    playwright/**
    fixtures/**
    mocks/**
    __mocks__/**
    config/**
    docs/**
    package.json
    playwright.config.*
    cypress.config.*
    jest.config.*
    vitest.config.*
    __tests__/**

    This skill uses file access to read user-provided test files, logs, CI output, traces, screenshots descriptions, mocks, fixtures, configuration files, package manifests, README files, and project notes. It uses write access to create structured Markdown/text outputs such as flaky test diagnosis reports, stabilization plans, CI-only failure investigations, E2E flake reports, reliability audits, QA tickets, AI coding prompts, verification plans, and SKILL.md files. Terminal access is optional and should only be enabled when the agent is expected to run tests, reproduce failures, or validate repeated test runs. Browser or network access is only needed for external framework documentation research. Environment variable access is not normally required, and secret values should never be exposed.

    Compatible with ChatGPT Custom GPTs, ChatGPT Agents, Cursor, Claude Code, Codex CLI, OpenCode, Replit, GitHub Copilot-style workflows, and other AI coding assistants that support structured Markdown instruction files such as SKILL.md. It can also be used manually in any AI chat by pasting the instructions.

    Creator

    Shandra is an AI prompt creator and agent skill builder specializing in practical, ready-to-use AI workflows for creators, entrepreneurs, educators, and digital product sellers. Her store focuses on high-quality agent skills designed to help users save time, structure ideas, generate content, build business assets, and turn creative concepts into actionable results. Each skill is crafted with clear instructions, professional formatting, practical use cases, and a strong focus on real-world productivity.

    Frequently Asked Questions

    More Premium Skills

    $10