
Flaky Test Detective for AI Coding Agents
Diagnoses unreliable tests, identifies root causes, creates stabilization plans, and generates safe AI coding prompts for fixing flaky unit, integration, E2E, and CI tests.
- Find and fix async timing issues and missing awaits in Playwright/Cypress.
- Detect shared state leaks causing order-dependent Jest or pytest failures.
- Audit CI logs to identify resource-related bottlenecks and headless-mode bugs.
$9.99
· or 50 creditsSecure checkout via Stripe
Included in download
- Find and fix async timing issues and missing awaits in Playwright/Cypress.
- Detect shared state leaks causing order-dependent Jest or pytest failures.
- browser, env_vars, terminal automation included
- Ready for Compatible with ChatGPT Custom GPTs
Sample input
Fix this flaky Playwright checkout test. It passes locally but fails in CI during the payment confirmation step. Do not just increase the timeout or skip the test. Create a diagnosis, likely root causes, stabilization plan, Cursor prompt, and verification plan.
Sample output
=== FLAKY TEST DETECTIVE REPORT === Original request: Fix this flaky Playwright checkout test. It passes locally but fails in CI during the payment confirmation step. Interpreted testing goal: Diagnose and stabilize an intermittent Playwright checkout test failure while preserving the test intent and avoiding superficial fixes. Test/framework: Playwright E2E test Test intent: Verify that a user can complete checkout and reach payment confirmation successfully. Failure pattern: Passes locally but fails in CI during the payment confirmation step. Confirmed facts: - The test uses Playwright. - The test covers checkout. - The failure occurs in CI. - The failure occurs around payment confirmation. - The user does not want a timeout-only or skip-test fix. Assumptions: - CI may run headless and slower than local. - The payment confirmation step may depend on network, backend state, mock setup, or asynchronous UI updates. - The test may use shared test data or a shared account. - Checkout is a high-risk flow and assertions should not be weakened. Unknowns: - exact error message - test code - trace/video availability - whether tests run in parallel - whether payment provider is mocked - whether checkout uses real backend services - whether a shared account/cart/order is used Likely root cause hypotheses: 1. Async timing around payment confirmation Evidence: CI failures during confirmation often happen when the test asserts UI state before backend/payment state has settled. How to confirm: Review Playwright trace, network logs, and screenshot at failure. Likely fix: Wait for a deterministic confirmation event, response, URL transition, or UI state instead of using a fixed delay. Risk if wrong: The test may continue failing if the real problem is data isolation or mock setup. 2. Payment/network mock not installed before the action Evidence: CI timing may allow payment request to escape or resolve differently. How to confirm: Inspect route/mock setup order and network trace. Likely fix: Install route mocks before triggering checkout and assert that expected requests occur. Risk if wrong: External dependency flake may persist. 3. Shared checkout state or reused test account Evidence: Checkout flows often fail in parallel when users, carts, orders, or payment sessions are reused. How to confirm: Check test data creation, worker IDs, account IDs, cart cleanup, and parallel settings. Likely fix: Create isolated user/cart/order data per test or worker. Risk if wrong: Parallel-only failures may continue. 4. Brittle selector for confirmation state Evidence: Confirmation UI may animate, render progressively, or have hidden duplicate elements. How to confirm: Inspect locator and trace DOM snapshot. Likely fix: Use stable role/name or test ID locator and wait for visibility/enabled state. Risk if wrong: Selector may still target the wrong element. Flakiness category: CI-only E2E flake involving async timing, network mocking, shared state, selector stability, or real product race. Do not fix by: - blindly increasing timeout - adding arbitrary waitForTimeout as the main fix - skipping the test - weakening checkout assertions - hiding the issue with retries only - changing payment behavior without evidence Investigation plan: 1. Collect CI failure trace, screenshot, video, console logs, and network logs. 2. Identify the exact failing locator, assertion, or timeout. 3. Run the test locally in headless mode. 4. Run the test repeatedly 20 times. 5. Run with CI-equivalent parallel workers. 6. Check whether payment mocks are installed before checkout action. 7. Check whether user/cart/order data is unique per test. 8. Check whether the test waits for network response, URL transition, or confirmation state. 9. Check whether the confirmation selector is stable and unique. Stabilization plan: - preserve the original checkout success intent - install mocks before actions - use deterministic waits for payment confirmation - isolate checkout data per test - replace brittle selectors with role/name or stable test IDs - capture trace/video/screenshot on failure - add cleanup before and after test if needed - verify with repeated runs AI coding agent prompt: Inspect this flaky Playwright checkout test and diagnose the root cause before changing code. Preserve the test intent: verifying successful checkout through payment confirmation. Review the failure message, CI trace, screenshot, video, console logs, network logs, selectors, mock setup, test data, parallel execution settings, and cleanup. Do not blindly increase timeouts, add arbitrary waitForTimeout calls, skip the test, or weaken assertions. Determine whether the flake is caused by async timing, selector instability, shared user/cart/order state, payment/network mock setup, CI environment differences, or a real product race. Fix the root cause with deterministic waits, isolated data, stable selectors, and proper mock setup. Return root cause, evidence, files inspected, files changed, why the fix is deterministic, verification runs performed or recommended, and remaining risks. Verification plan: - run the test alone 20 times - run the test with related checkout tests - run in headless mode - run with the same parallel worker settings as CI - confirm failure traces remain enabled - confirm no arbitrary sleep was added as the main fix - confirm checkout assertions still prove the original behavior Remaining risks: If traces show that the application sometimes reaches an inconsistent checkout state, the flake may reveal a real product race condition requiring production-code synchronization and regression coverage.

Flaky Test Detective for AI Coding Agents
Diagnoses unreliable tests, identifies root causes, creates stabilization plans, and generates safe AI coding prompts for fixing flaky unit, integration, E2E, and CI tests.
$9.99
· or 50 creditsSecure checkout via Stripe
Included in download
- Find and fix async timing issues and missing awaits in Playwright/Cypress.
- Detect shared state leaks causing order-dependent Jest or pytest failures.
- browser, env_vars, terminal automation included
- Ready for Compatible with ChatGPT Custom GPTs
- Instant install
Sample input
Fix this flaky Playwright checkout test. It passes locally but fails in CI during the payment confirmation step. Do not just increase the timeout or skip the test. Create a diagnosis, likely root causes, stabilization plan, Cursor prompt, and verification plan.
Sample output
=== FLAKY TEST DETECTIVE REPORT === Original request: Fix this flaky Playwright checkout test. It passes locally but fails in CI during the payment confirmation step. Interpreted testing goal: Diagnose and stabilize an intermittent Playwright checkout test failure while preserving the test intent and avoiding superficial fixes. Test/framework: Playwright E2E test Test intent: Verify that a user can complete checkout and reach payment confirmation successfully. Failure pattern: Passes locally but fails in CI during the payment confirmation step. Confirmed facts: - The test uses Playwright. - The test covers checkout. - The failure occurs in CI. - The failure occurs around payment confirmation. - The user does not want a timeout-only or skip-test fix. Assumptions: - CI may run headless and slower than local. - The payment confirmation step may depend on network, backend state, mock setup, or asynchronous UI updates. - The test may use shared test data or a shared account. - Checkout is a high-risk flow and assertions should not be weakened. Unknowns: - exact error message - test code - trace/video availability - whether tests run in parallel - whether payment provider is mocked - whether checkout uses real backend services - whether a shared account/cart/order is used Likely root cause hypotheses: 1. Async timing around payment confirmation Evidence: CI failures during confirmation often happen when the test asserts UI state before backend/payment state has settled. How to confirm: Review Playwright trace, network logs, and screenshot at failure. Likely fix: Wait for a deterministic confirmation event, response, URL transition, or UI state instead of using a fixed delay. Risk if wrong: The test may continue failing if the real problem is data isolation or mock setup. 2. Payment/network mock not installed before the action Evidence: CI timing may allow payment request to escape or resolve differently. How to confirm: Inspect route/mock setup order and network trace. Likely fix: Install route mocks before triggering checkout and assert that expected requests occur. Risk if wrong: External dependency flake may persist. 3. Shared checkout state or reused test account Evidence: Checkout flows often fail in parallel when users, carts, orders, or payment sessions are reused. How to confirm: Check test data creation, worker IDs, account IDs, cart cleanup, and parallel settings. Likely fix: Create isolated user/cart/order data per test or worker. Risk if wrong: Parallel-only failures may continue. 4. Brittle selector for confirmation state Evidence: Confirmation UI may animate, render progressively, or have hidden duplicate elements. How to confirm: Inspect locator and trace DOM snapshot. Likely fix: Use stable role/name or test ID locator and wait for visibility/enabled state. Risk if wrong: Selector may still target the wrong element. Flakiness category: CI-only E2E flake involving async timing, network mocking, shared state, selector stability, or real product race. Do not fix by: - blindly increasing timeout - adding arbitrary waitForTimeout as the main fix - skipping the test - weakening checkout assertions - hiding the issue with retries only - changing payment behavior without evidence Investigation plan: 1. Collect CI failure trace, screenshot, video, console logs, and network logs. 2. Identify the exact failing locator, assertion, or timeout. 3. Run the test locally in headless mode. 4. Run the test repeatedly 20 times. 5. Run with CI-equivalent parallel workers. 6. Check whether payment mocks are installed before checkout action. 7. Check whether user/cart/order data is unique per test. 8. Check whether the test waits for network response, URL transition, or confirmation state. 9. Check whether the confirmation selector is stable and unique. Stabilization plan: - preserve the original checkout success intent - install mocks before actions - use deterministic waits for payment confirmation - isolate checkout data per test - replace brittle selectors with role/name or stable test IDs - capture trace/video/screenshot on failure - add cleanup before and after test if needed - verify with repeated runs AI coding agent prompt: Inspect this flaky Playwright checkout test and diagnose the root cause before changing code. Preserve the test intent: verifying successful checkout through payment confirmation. Review the failure message, CI trace, screenshot, video, console logs, network logs, selectors, mock setup, test data, parallel execution settings, and cleanup. Do not blindly increase timeouts, add arbitrary waitForTimeout calls, skip the test, or weaken assertions. Determine whether the flake is caused by async timing, selector instability, shared user/cart/order state, payment/network mock setup, CI environment differences, or a real product race. Fix the root cause with deterministic waits, isolated data, stable selectors, and proper mock setup. Return root cause, evidence, files inspected, files changed, why the fix is deterministic, verification runs performed or recommended, and remaining risks. Verification plan: - run the test alone 20 times - run the test with related checkout tests - run in headless mode - run with the same parallel worker settings as CI - confirm failure traces remain enabled - confirm no arbitrary sleep was added as the main fix - confirm checkout assertions still prove the original behavior Remaining risks: If traces show that the application sometimes reaches an inconsistent checkout state, the flake may reveal a real product race condition requiring production-code synchronization and regression coverage.
About This Skill
Flaky Test Detective helps AI coding agents, developers, QA engineers, CI/CD teams, SaaS builders, and test automation teams investigate unreliable tests that pass and fail inconsistently. It analyzes flaky unit tests, integration tests, E2E tests, Playwright tests, Cypress tests, Selenium tests, Jest tests, Vitest tests, pytest tests, browser tests, API tests, database tests, and CI-only failures. The skill creates root cause hypotheses, failure pattern reports, async timing audits, selector stability reviews, test isolation plans, mock leakage checks, database cleanup reviews, CI environment comparisons, stabilization roadmaps, QA tickets, verification plans, and paste-ready prompts for Cursor, Claude Code, Codex CLI, OpenCode, Replit, and ChatGPT Agents. It is designed to restore trust in test suites by fixing real causes instead of hiding failures with retries, arbitrary sleeps, skipped tests, or weakened assertions.
Use Cases
- Find and fix async timing issues and missing awaits in Playwright/Cypress.
- Detect shared state leaks causing order-dependent Jest or pytest failures.
- Audit CI logs to identify resource-related bottlenecks and headless-mode bugs.
- Replace arbitrary sleeps with deterministic state-based wait conditions.
- Generate safe AI prompts to fix unreliable tests without hiding real bugs.
- Restore confidence in unstable test suites
- Distinguish test bugs from real product race conditions
- Create test reliability tickets for engineering teams
Known Limitations
This skill creates flaky test diagnosis reports, root cause hypotheses, stabilization plans, and AI coding prompts, but it does not guarantee that every flaky test will be fixed without access to test code, application code, CI logs, traces, screenshots, repeated runs, environment details, and human review. Some intermittent failures are caused by real product race conditions, external services, infrastructure instability, browser differences, or environment-specific behavior that require deeper debugging and repeated verification.
How to Install
mkdir -p ~/.claude/skills && curl -sL https://www.agensi.io/api/install/flaky-test-detective-for-ai-coding-agents -o /tmp/flaky-test-detective-for-ai-coding-agents.zip && unzip -o /tmp/flaky-test-detective-for-ai-coding-agents.zip -d ~/.claude/skills && rm /tmp/flaky-test-detective-for-ai-coding-agents.zipFree skills install directly. Paid skills require purchase - use the download button above after buying.
Reviews
No reviews yet - be the first to share your experience.
Only users who have downloaded or purchased this skill can leave a review.
Early access skill
Be the first to review this skill.
Only users who have downloaded or purchased this skill can leave a review.
Security Scanned
Passed automated security review
Permissions
File Scopes
This skill uses file access to read user-provided test files, logs, CI output, traces, screenshots descriptions, mocks, fixtures, configuration files, package manifests, README files, and project notes. It uses write access to create structured Markdown/text outputs such as flaky test diagnosis reports, stabilization plans, CI-only failure investigations, E2E flake reports, reliability audits, QA tickets, AI coding prompts, verification plans, and SKILL.md files. Terminal access is optional and should only be enabled when the agent is expected to run tests, reproduce failures, or validate repeated test runs. Browser or network access is only needed for external framework documentation research. Environment variable access is not normally required, and secret values should never be exposed.
Compatible with ChatGPT Custom GPTs, ChatGPT Agents, Cursor, Claude Code, Codex CLI, OpenCode, Replit, GitHub Copilot-style workflows, and other AI coding assistants that support structured Markdown instruction files such as SKILL.md. It can also be used manually in any AI chat by pasting the instructions.