
Flaky Test Detective for AI Coding Agents
Diagnoses unreliable tests, identifies root causes, creates stabilization plans, and generates safe AI coding prompts for fixing flaky unit, integration, E2E, and CI tests.
- Find and fix async timing issues and missing awaits in Playwright/Cypress.
- Detect shared state leaks causing order-dependent Jest or pytest failures.
- Audit CI logs to identify resource-related bottlenecks and headless-mode bugs.
Secure checkout via Stripe
Included in download
- Find and fix async timing issues and missing awaits in Playwright/Cypress.
- Detect shared state leaks causing order-dependent Jest or pytest failures.
- browser, env_vars, terminal automation included
- Ready for Compatible with ChatGPT Custom GPTs
Sample Output
A real example of what this skill produces.
=== FLAKY TEST DETECTIVE REPORT ===
Original request: Fix this flaky Playwright checkout test. It passes locally but fails in CI during the payment confirmation step.
Interpreted testing goal: Diagnose and stabilize an intermittent Playwright checkout test failure while preserving the test intent and avoiding superficial fixes.
Test/framework: Playwright E2E test
Test intent: Verify that a user can complete checkout and reach payment confirmation successfully.
Failure pattern: Passes locally but fails in CI during the payment confirmation step.
Confirmed facts:
- The test uses Playwright.
- The test covers checkout.
- The failure occurs in CI.
- The failure occurs around payment confirmation.
- The user does not want a timeout-only or skip-test fix.
Assumptions:
- CI may run headless and slower than local.
- The payment confirmation step may depend on network, backend state, mock setup, or asynchronous UI updates.
- The test may use shared test data or a shared account.
- Checkout is a high-risk flow and assertions should not be weakened.
Unknowns:
- exact error message
- test code
- trace/video availability
- whether tests run in parallel
- whether payment provider is mocked
- whether checkout uses real backend services
- whether a shared account/cart/order is used
Likely root cause hypotheses:
- Async timing around payment confirmation
Evidence: CI failures during confirmation often happen when the test asserts UI state before backend/payment state has settled.
How to confirm: Review Playwright trace, network logs, and screenshot at failure.
Likely fix: Wait for a deterministic confirmation event, response, URL transition, or UI state instead of using a fixed delay.
Risk if wrong: The test may continue failing if the real problem is data isolation or mock setup.
- Payment/network mock not installed before the action
Evidence: CI timing may allow payment request to escape or resolve differently.
How to confirm: Inspect route/mock setup order and network trace.
Likely fix: Install route mocks before triggering checkout and assert that expected requests occur.
Risk if wrong: External dependency flake may persist.
- Shared checkout state or reused test account
Evidence: Checkout flows often fail in parallel when users, carts, orders, or payment sessions are reused.
How to confirm: Check test data creation, worker IDs, account IDs, cart cleanup, and parallel settings.
Likely fix: Create isolated user/cart/order data per test or worker.
Risk if wrong: Parallel-only failures may continue.
- Brittle selector for confirmation state
Evidence: Confirmation UI may animate, render progressively, or have hidden duplicate elements.
How to confirm: Inspect locator and trace DOM snapshot.
Likely fix: Use stable role/name or test ID locator and wait for visibility/enabled state.
Risk if wrong: Selector may still target the wrong element.
Flakiness category: CI-only E2E flake involving async timing, network mocking, shared state, selector stability, or real product race.
Do not fix by:
- blindly increasing timeout
- adding arbitrary waitForTimeout as the main fix
- skipping the test
- weakening checkout assertions
- hiding the issue with retries only
- changing payment behavior without evidence
Investigation plan:
- Collect CI failure trace, screenshot, video, console logs, and network logs.
- Identify the exact failing locator, assertion, or timeout.
- Run the test locally in headless mode.
- Run the test repeatedly 20 times.
- Run with CI-equivalent parallel workers.
- Check whether payment mocks are installed before checkout action.
- Check whether user/cart/order data is unique per test.
- Check whether the test waits for network response, URL transition, or confirmation state.
- Check whether the confirmation selector is stable and unique.
Stabilization plan:
- preserve the original checkout success intent
- install mocks before actions
- use deterministic waits for payment confirmation
- isolate checkout data per test
- replace brittle selectors with role/name or stable test IDs
- capture trace/video/screenshot on failure
- add cleanup before and after test if needed
- verify with repeated runs
AI coding agent prompt: Inspect this flaky Playwright checkout test and diagnose the root cause before changing code. Preserve the test intent: verifying successful checkout through payment confirmation. Review the failure message, CI trace, screenshot, video, console logs, network logs, selectors, mock setup, test data, parallel execution settings, and cleanup. Do not blindly increase timeouts, add arbitrary waitForTimeout calls, skip the test, or weaken assertions. Determine whether the flake is caused by async timing, selector instability, shared user/cart/order state, payment/network mock setup, CI environment differences, or a real product race. Fix the root cause with deterministic waits, isolated data, stable selectors, and proper mock setup. Return root cause, evidence, files inspected, files changed, why the fix is deterministic, verification runs performed or recommended, and remaining risks.
Verification plan:
- run the test alone 20 times
- run the test with related checkout tests
- run in headless mode
- run with the same parallel worker settings as CI
- confirm failure traces remain enabled
- confirm no arbitrary sleep was added as the main fix
- confirm checkout assertions still prove the original behavior
Remaining risks: If traces show that the application sometimes reaches an inconsistent checkout state, the flake may reveal a real product race condition requiring production-code synchronization and regression coverage.
Diagnoses unreliable tests, identifies root causes, creates stabilization plans, and generates safe AI coding prompts for fixing flaky unit, integration, E2E, and CI tests.
Secure checkout via Stripe
Included in download
- Find and fix async timing issues and missing awaits in Playwright/Cypress.
- Detect shared state leaks causing order-dependent Jest or pytest failures.
- browser, env_vars, terminal automation included
- Ready for Compatible with ChatGPT Custom GPTs
- Instant install
Sample Output
A real example of what this skill produces.
=== FLAKY TEST DETECTIVE REPORT ===
Original request: Fix this flaky Playwright checkout test. It passes locally but fails in CI during the payment confirmation step.
Interpreted testing goal: Diagnose and stabilize an intermittent Playwright checkout test failure while preserving the test intent and avoiding superficial fixes.
Test/framework: Playwright E2E test
Test intent: Verify that a user can complete checkout and reach payment confirmation successfully.
Failure pattern: Passes locally but fails in CI during the payment confirmation step.
Confirmed facts:
- The test uses Playwright.
- The test covers checkout.
- The failure occurs in CI.
- The failure occurs around payment confirmation.
- The user does not want a timeout-only or skip-test fix.
Assumptions:
- CI may run headless and slower than local.
- The payment confirmation step may depend on network, backend state, mock setup, or asynchronous UI updates.
- The test may use shared test data or a shared account.
- Checkout is a high-risk flow and assertions should not be weakened.
Unknowns:
- exact error message
- test code
- trace/video availability
- whether tests run in parallel
- whether payment provider is mocked
- whether checkout uses real backend services
- whether a shared account/cart/order is used
Likely root cause hypotheses:
- Async timing around payment confirmation
Evidence: CI failures during confirmation often happen when the test asserts UI state before backend/payment state has settled.
How to confirm: Review Playwright trace, network logs, and screenshot at failure.
Likely fix: Wait for a deterministic confirmation event, response, URL transition, or UI state instead of using a fixed delay.
Risk if wrong: The test may continue failing if the real problem is data isolation or mock setup.
- Payment/network mock not installed before the action
Evidence: CI timing may allow payment request to escape or resolve differently.
How to confirm: Inspect route/mock setup order and network trace.
Likely fix: Install route mocks before triggering checkout and assert that expected requests occur.
Risk if wrong: External dependency flake may persist.
- Shared checkout state or reused test account
Evidence: Checkout flows often fail in parallel when users, carts, orders, or payment sessions are reused.
How to confirm: Check test data creation, worker IDs, account IDs, cart cleanup, and parallel settings.
Likely fix: Create isolated user/cart/order data per test or worker.
Risk if wrong: Parallel-only failures may continue.
- Brittle selector for confirmation state
Evidence: Confirmation UI may animate, render progressively, or have hidden duplicate elements.
How to confirm: Inspect locator and trace DOM snapshot.
Likely fix: Use stable role/name or test ID locator and wait for visibility/enabled state.
Risk if wrong: Selector may still target the wrong element.
Flakiness category: CI-only E2E flake involving async timing, network mocking, shared state, selector stability, or real product race.
Do not fix by:
- blindly increasing timeout
- adding arbitrary waitForTimeout as the main fix
- skipping the test
- weakening checkout assertions
- hiding the issue with retries only
- changing payment behavior without evidence
Investigation plan:
- Collect CI failure trace, screenshot, video, console logs, and network logs.
- Identify the exact failing locator, assertion, or timeout.
- Run the test locally in headless mode.
- Run the test repeatedly 20 times.
- Run with CI-equivalent parallel workers.
- Check whether payment mocks are installed before checkout action.
- Check whether user/cart/order data is unique per test.
- Check whether the test waits for network response, URL transition, or confirmation state.
- Check whether the confirmation selector is stable and unique.
Stabilization plan:
- preserve the original checkout success intent
- install mocks before actions
- use deterministic waits for payment confirmation
- isolate checkout data per test
- replace brittle selectors with role/name or stable test IDs
- capture trace/video/screenshot on failure
- add cleanup before and after test if needed
- verify with repeated runs
AI coding agent prompt: Inspect this flaky Playwright checkout test and diagnose the root cause before changing code. Preserve the test intent: verifying successful checkout through payment confirmation. Review the failure message, CI trace, screenshot, video, console logs, network logs, selectors, mock setup, test data, parallel execution settings, and cleanup. Do not blindly increase timeouts, add arbitrary waitForTimeout calls, skip the test, or weaken assertions. Determine whether the flake is caused by async timing, selector instability, shared user/cart/order state, payment/network mock setup, CI environment differences, or a real product race. Fix the root cause with deterministic waits, isolated data, stable selectors, and proper mock setup. Return root cause, evidence, files inspected, files changed, why the fix is deterministic, verification runs performed or recommended, and remaining risks.
Verification plan:
- run the test alone 20 times
- run the test with related checkout tests
- run in headless mode
- run with the same parallel worker settings as CI
- confirm failure traces remain enabled
- confirm no arbitrary sleep was added as the main fix
- confirm checkout assertions still prove the original behavior
Remaining risks: If traces show that the application sometimes reaches an inconsistent checkout state, the flake may reveal a real product race condition requiring production-code synchronization and regression coverage.
About This Skill
Flaky Test Detective helps AI coding agents, developers, QA engineers, CI/CD teams, SaaS builders, and test automation teams investigate unreliable tests that pass and fail inconsistently. It analyzes flaky unit tests, integration tests, E2E tests, Playwright tests, Cypress tests, Selenium tests, Jest tests, Vitest tests, pytest tests, browser tests, API tests, database tests, and CI-only failures. The skill creates root cause hypotheses, failure pattern reports, async timing audits, selector stability reviews, test isolation plans, mock leakage checks, database cleanup reviews, CI environment comparisons, stabilization roadmaps, QA tickets, verification plans, and paste-ready prompts for Cursor, Claude Code, Codex CLI, OpenCode, Replit, and ChatGPT Agents. It is designed to restore trust in test suites by fixing real causes instead of hiding failures with retries, arbitrary sleeps, skipped tests, or weakened assertions.
📖 Learn more: Best Testing & QA Skills for Claude Code →
Use Cases
- Find and fix async timing issues and missing awaits in Playwright/Cypress.
- Detect shared state leaks causing order-dependent Jest or pytest failures.
- Audit CI logs to identify resource-related bottlenecks and headless-mode bugs.
- Replace arbitrary sleeps with deterministic state-based wait conditions.
- Generate safe AI prompts to fix unreliable tests without hiding real bugs.
- Restore confidence in unstable test suites
- Distinguish test bugs from real product race conditions
- Create test reliability tickets for engineering teams
Known Limitations
This skill creates flaky test diagnosis reports, root cause hypotheses, stabilization plans, and AI coding prompts, but it does not guarantee that every flaky test will be fixed without access to test code, application code, CI logs, traces, screenshots, repeated runs, environment details, and human review. Some intermittent failures are caused by real product race conditions, external services, infrastructure instability, browser differences, or environment-specific behavior that require deeper debugging and repeated verification.
How to Install
mkdir -p ~/.claude/skills && curl -sL https://www.agensi.io/api/install/flaky-test-detective-for-ai-coding-agents | tar xz -C ~/.claude/skills/Free skills install directly. Paid skills require purchase - use the download button above after buying.
Reviews
No reviews yet - be the first to share your experience.
Only users who have downloaded or purchased this skill can leave a review.
Early access skill
Be the first to review this skill.
Only users who have downloaded or purchased this skill can leave a review.
Security Scanned
Passed automated security review
Permissions
File Scopes
This skill uses file access to read user-provided test files, logs, CI output, traces, screenshots descriptions, mocks, fixtures, configuration files, package manifests, README files, and project notes. It uses write access to create structured Markdown/text outputs such as flaky test diagnosis reports, stabilization plans, CI-only failure investigations, E2E flake reports, reliability audits, QA tickets, AI coding prompts, verification plans, and SKILL.md files. Terminal access is optional and should only be enabled when the agent is expected to run tests, reproduce failures, or validate repeated test runs. Browser or network access is only needed for external framework documentation research. Environment variable access is not normally required, and secret values should never be exposed.
Compatible with ChatGPT Custom GPTs, ChatGPT Agents, Cursor, Claude Code, Codex CLI, OpenCode, Replit, GitHub Copilot-style workflows, and other AI coding assistants that support structured Markdown instruction files such as SKILL.md. It can also be used manually in any AI chat by pasting the instructions.
Creator
Shandra is an AI prompt creator and agent skill builder specializing in practical, ready-to-use AI workflows for creators, entrepreneurs, educators, and digital product sellers. Her store focuses on high-quality agent skills designed to help users save time, structure ideas, generate content, build business assets, and turn creative concepts into actionable results. Each skill is crafted with clear instructions, professional formatting, practical use cases, and a strong focus on real-world productivity.
Frequently Asked Questions
Learn More About AI Agent Skills
More Premium Skills
diagnosing-rag-failure-modes
RAG fails quietly. It retrieves documents, returns confident-looking answers, and misses the question entirely — because the question required connecting facts across documents, reasoning about sequence, or tracing causation. This skill gives you a five-question diagnostic checklist that classifies any failing query as either RAG-safe or structurally RAG-incompatible, then maps it to the specific failure pattern and the architectural fix that resolves it.
designing-hybrid-context-layers
Architects the right retrieval strategy for every query — teaching your agent when to use RAG, a knowledge graph, or a temporal index instead of defaulting to vector search for everything.
consumer-motivation-analyzer
Go beyond surface-level feedback to uncover the psychological drivers and hidden motivations behind buyer behavior.
Bounty Security Pattern Master Library — 399 Vulnerability Patterns
A premium library of 399 vulnerability patterns and DeFi attack vectors for AI-driven bug hunting and security audits.