1
    Windows Desktop Automation Architect for AI Coding Agents

    Windows Desktop Automation Architect for AI Coding Agents

    Designs robust Windows desktop automation workflows using pywinauto, UI Automation, hotkeys, image matching, OCR, retries, logging, screenshots, and safety controls.

    Updated May 2026
    Security scanned
    One-time purchase
    Compatible with ChatGPT Custom GPTs

    $9.99

    · or 50 credits

    One-time purchase

    30-day refund guarantee

    Secure checkout via Stripe

    Included in download

    • Convert manual Windows clicks into resilient pywinauto selector-based scripts.
    • Add screenshot diagnostics and retry logic to brittle desktop bots.
    • terminal, file_write, file_read automation included
    • Ready for Compatible with ChatGPT Custom GPTs
    • Instant install

    Sample Output

    A real example of what this skill produces.

    === WINDOWS DESKTOP AUTOMATION ARCHITECTURE PLAN ===

    Original request: Create a pywinauto automation architecture plan and AI coding agent prompt for a Windows desktop app that exports a monthly sales report.

    Interpreted automation goal: Automate a Windows desktop workflow that opens the target app, navigates to Reports, selects a monthly date range, exports a sales report as an Excel file, handles the Save As dialog, and verifies the exported file.

    Target application: Unspecified Windows desktop application

    Workflow type: Report export automation

    Assumptions:

    • The user is authorized to automate this app.
    • The app has a visible Reports section.
    • The export creates an Excel file.
    • pywinauto may be able to inspect the app through UIA or Win32.
    • A standard Windows Save As dialog may appear during export.

    Unknowns to clarify:

    • application executable name
    • app technology: Win32, WPF, WinForms, Electron, Java, Qt, Citrix, or unknown
    • whether login is required
    • output folder
    • whether report controls expose stable properties
    • whether date controls are accessible
    • DPI scaling and screen resolution
    • whether the workflow runs locally or through Remote Desktop/Citrix

    Better alternatives to GUI automation: Before GUI automation, check whether the app provides an API, scheduled export, command-line export, database report, built-in report scheduler, or file-based export configuration. If no reliable non-GUI method exists, desktop automation is appropriate.

    Recommended automation strategy: Use pywinauto UI Automation first if controls are visible in the accessibility tree. Test the Win32 backend if UIA does not expose stable controls. Use stable selectors for the main window, report navigation, date fields, export button, and Save As dialog. Use hotkeys only after focus verification. Use image matching for inaccessible custom buttons. Use OCR only for status text or confirmation text when control text is unavailable.

    Workflow map:

    1. Launch or connect to the application.
    2. Wait for the main window to exist and become visible.
    3. Handle login or stop for secure manual login if required.
    4. Navigate to Reports.
    5. Select the monthly sales report.
    6. Set the start and end date fields.
    7. Verify the date range was accepted.
    8. Trigger Export.
    9. Wait for the Save As dialog.
    10. Save the file to the configured output folder.
    11. Wait for export completion.
    12. Verify the Excel file exists and file size is greater than zero.
    13. Log success and final output path.

    Selector strategy:

    • Use process name or executable path for connection.
    • Use main window title regex instead of exact volatile titles.
    • Use automation ID, control type, class name, or stable text for controls.
    • Use parent-child hierarchy when needed.
    • Use standard Windows dialog selectors for Save As.
    • Avoid raw coordinates unless no selector, hotkey, image, or OCR alternative exists.

    Hotkey strategy: Use hotkeys only when:

    • the active window is verified
    • focus is known
    • the shortcut is stable
    • the resulting state is verified

    Possible fallback hotkeys:

    • documented Alt menu shortcuts
    • Ctrl+S only if it reliably triggers export/save
    • Tab navigation only with state verification, not blind repeated sequences

    Image matching strategy: Use image matching only for inaccessible custom controls. Requirements:

    • restrict search to toolbar or report panel regions
    • define match threshold
    • capture templates at the same DPI and theme
    • validate expected state after clicking
    • capture screenshot if image target is not found

    OCR strategy: Use OCR only for:

    • export status text
    • confirmation messages
    • error banners
    • report title verification if control text is unavailable

    OCR requirements:

    • restrict OCR to the status/message region
    • define expected phrases such as Export complete or Report generated
    • use confidence thresholds
    • avoid capturing sensitive report content unless approved

    Waits and synchronization:

    • wait for main window visible
    • wait for Reports screen loaded
    • wait for date fields enabled
    • wait for Export button enabled
    • wait for Save As dialog
    • wait for output file creation
    • wait for file size to stabilize
    • timeout each step with clear error messages

    Error handling and recovery:

    • if app is not running, launch it
    • if wrong window is active, refocus the main window
    • if login is required, stop or request secure manual login
    • if a control is not found, retry and capture screenshot
    • if an unexpected popup appears, classify and stop or handle known popup
    • if export times out, retry once
    • if file already exists, apply overwrite policy
    • if OCR confidence is low, request manual confirmation
    • if failure persists, abort safely with logs

    Logging and diagnostics: Log:

    • timestamp
    • workflow step
    • selector used
    • active window title
    • success or failure
    • elapsed time
    • retry count
    • output path
    • screenshot path on failure

    Do not log:

    • passwords
    • tokens
    • secret values
    • sensitive report data
    • full screenshots containing private data unless approved

    Safety controls:

    • dry-run mode up to the Export action
    • configurable output folder
    • overwrite protection
    • maximum retry limit
    • user confirmation before overwriting files
    • stop-on-error behavior
    • screenshot logging can be disabled for sensitive environments

    Implementation structure: automation_project/ main.py config.py app_connection.py selectors.py workflow_steps.py image_matching.py ocr_utils.py logging_utils.py safety.py recovery.py tests/ logs/ screenshots/ README.md

    AI coding agent prompt: Build a robust Windows desktop automation script in Python using pywinauto for exporting a monthly sales report from a desktop application. First determine whether the UIA or Win32 backend is more appropriate. Prefer stable selectors over coordinates. Create modular files for app connection, selectors, workflow steps, logging, recovery, image matching fallback, OCR fallback, safety controls, and configuration. Add explicit waits for windows, controls, enabled states, dialogs, export completion, and file creation. Use hotkeys only when the active window and resulting state can be verified. Use image matching only for inaccessible custom controls and OCR only for status verification when control text is unavailable. Add structured logs and screenshots on failure while avoiding sensitive data. Include dry-run mode, overwrite protection, retry limits, and manual confirmation for risky steps. Return automation strategy, selectors used, fallback strategy, how to run the script, testing plan, and known limitations.

    Testing plan:

    • app closed before run
    • app already open before run
    • wrong window active
    • slow Reports screen loading
    • Save As dialog delayed
    • output file already exists
    • export succeeds
    • export times out
    • unexpected popup appears
    • different DPI scaling
    • screenshot logging disabled
    • dry-run mode enabled
    • Remote Desktop or Citrix environment if applicable

    Known limitations: Reliability depends on target app technology, control accessibility, app version, Windows permissions, DPI scaling, resolution, language, theme, timing, and local versus remote execution.

    Verification checklist:

    • app launches or connects successfully
    • main window is detected
    • Reports screen is reached
    • date range is set correctly
    • Export action is triggered only after state verification
    • Save As dialog is handled
    • Excel file exists
    • file size is greater than zero
    • logs are created
    • screenshot captured on failure when allowed
    • no credentials or sensitive report data are logged

    About This Skill

    Windows Desktop Automation Architect helps AI coding agents, developers, automation builders, QA teams, operations teams, and business users design reliable Windows GUI automation for desktop applications that do not have easy APIs. It creates automation architecture plans, pywinauto prompts, manual workflow conversion specs, selector strategies, hotkey plans, image matching fallbacks, OCR fallback plans, retry and timeout strategies, logging and screenshot diagnostics, safety checklists, debugging prompts, and test plans. The skill is ideal for automating legacy Windows apps, ERP/CRM tools, report exports, file dialogs, data entry workflows, installer screens, remote desktop workflows, and repetitive internal business processes while avoiding brittle coordinate-only scripts.

    📖 Learn more: Best Frontend & Design Skills for Claude Code →

    Reviews

    No reviews yet - be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    Terminal / Shell
    Write Files
    Read Files

    Allowed Hosts

    https://promptbase.com/profile/shandra?via=mhq19

    File Scopes

    *.md
    *.txt
    *.json
    *.yaml
    *.yml
    *.py
    src/**
    scripts/**
    automation/**
    tests/**
    docs/**
    config/**
    screenshots/**
    templates/**
    logs/**
    requirements.txt
    pyproject.toml
    *.log
    README.md

    This skill uses file access to read user-provided automation scripts, workflow notes, logs, selector dumps, screenshot descriptions, configuration examples, README files, and project files. It uses write access to create structured Markdown/text outputs such as Windows automation architecture plans, pywinauto implementation prompts, manual workflow conversion specs, image matching and OCR plans, reliability audits, safety checklists, debugging prompts, testing plans, documentation, and SKILL.md files. Terminal access is optional and should only be enabled when the agent is expected to inspect or test local automation scripts. Browser or network access is only needed for external documentation research. Environment variable access is not normally required, and secret values should never be exposed.

    Compatible with ChatGPT Custom GPTs, ChatGPT Agents, Cursor, Claude Code, Codex CLI, OpenCode, Replit, and other AI coding agent workflows that support structured Markdown instruction files such as SKILL.md. It can also be used manually in any AI chat by pasting the instructions.

    Frequently Asked Questions

    $10