1

    Windows Desktop Automation v2

    by Kaymue

    Automate any Windows app with UI Automation, OCR, and hotkeys. Drive legacy/Citrix/RDP UIs. Python scripts for pywinauto + OpenCV.

    Updated Jun 2026
    0 installs

    Free

    Included in download

    • Downloadable skill package
    • 2 permissions declared
    • Instant install

    About This Skill

    # Windows Desktop Automation End-to-end Windows desktop automation for AI coding agents. Drive legacy applications, terminal emulators, Citrix windows, and GUI-only tools with the same rigor you apply to REST APIs. ## What it does This skill gives your AI agent a structured workflow for automating **any** Windows desktop application — even those with no CLI, no API, and no documented automation surface. It combines four proven techniques into one resilient stack: - **UI Automation (pywinauto + UIA)** for native Win32, WPF, WinForms, and Qt apps - **Hotkey/Macro layer** for keyboard-shortcut-driven workflows - **Image recognition (OpenCV + Tesseract OCR)** for Citrix, RDP, custom-drawn, and remote-desktop UIs - **Resilience patterns** — retry, recovery, evidence capture, and human-in-the-loop fallback It enforces a **"sniff → classify → select → execute → verify"** cycle before any click or keypress, dramatically reducing the silent-failure problem that kills most RPA scripts. ## When to use it - The target app has **no API, no CLI, and no automation hooks** - You need to drive **Citrix, RDP, terminal emulators, or VB6/MFC legacy apps** - Selenium/Playwright can't help (the UI is not in a browser) - You need **screen-scraping + form-filling** with audit trail - A human currently does the task and you want to hand it off to an agent ## Why it's better than ad-hoc prompting Most LLMs will try to `pyautogui.click(540, 320)` once and call it done. This skill is different: - It **inspects the window tree first** (`Inspect.exe`, `uiautomation`, `pywinauto`) before any action - It **classifies the target control** (Edit, Button, List, Tree, etc.) and picks the right interaction mode - It uses **multiple locator strategies** in parallel (UIA, name, class, image hash, OCR) and falls back gracefully - It **captures screenshots before/after every step** for postmortem - It **never trusts a click** — verifies state change with a follow-up read ## Architecture ``` ┌──────────────────────────────────────────────────────────┐ │ Agent (Claude/Cursor) │ │ - Translates intent into action plan │ │ - Calls helper scripts via Bash │ └───────────────┬──────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────┐ │ skills/windows-desktop-automation/ │ │ scripts/ │ │ ├── sniff_window.py # UIA + window tree dump │ │ ├── classify_target.py # Control type + locator plan │ │ ├── act.py # Click/type/scroll + verify │ │ ├── ocr_click.py # Image+OCR fallback │ │ ├── hotkey.py # Global hotkey macros │ │ └── evidence.py # Screenshot + log capture │ │ references/ │ │ ├── locator-strategies.md │ │ ├── citrix-rdp-gotchas.md │ │ └── uia-cheatsheet.md │ └──────────────────────────────────────────────────────────┘ ``` ## Quick start ```bash # 1. Install pip install pywinauto opencv-python pytesseract pillow psutil # 2. Sniff a window to discover its controls python scripts/sniff_window.py --title "Order Entry" # 3. Drive an action python scripts/act.py --window "Order Entry" \ --action type --control "Customer ID" --value "ACME-001" # 4. If UIA fails, fall back to image+OCR python scripts/ocr_click.py --window "Order Entry" \ --template templates/submit_button.png ``` ## Supported scenarios | Scenario | Tool | Reliability | |----------|------|-------------| | Native Win32/WPF/WinForms | pywinauto UIA backend | ★★★★★ | | Qt apps | pywinauto + UIA | ★★★★ | | Java Swing | pywinauto + JAB | ★★★★ | | Citrix / RDP | OCR + image templates | ★★★ | | Custom-drawn legacy | OCR + image templates | ★★ | | Browser-as-UI | Playwright (use that instead) | ★★★★★ | ## Installation ### Claude Code / OpenClaw / Codex CLI ```bash npx agensi install windows-desktop-automation ``` ### Manual Download `windows-desktop-automation.zip` and unzip to `~/.claude/skills/` (or your agent's skills directory). ## Example usage > "Open SAP GUI, navigate to transaction VA01, create a sales order for customer ACME-001, and dump the resulting order number to a JSON file." The skill will: 1. Launch SAP GUI and wait for the main window 2. Sniff the tree, locate the transaction code field 3. Type "VA01", press Enter, wait for the new screen 4. Re-sniff, locate the Order Type field 5. Drive the multi-step form with locator+image fallback 6. Capture the resulting order number from the status bar 7. Write JSON + screenshots to `./evidence/` ## Common failure modes (and how this skill handles them) - **Window not found** → polls every 500ms up to 30s, then OCR fallback - **Control not enabled** → waits + retries, then captures evidence - **Modal dialog appears** → auto-detects via window-tree diff, dismisses or escalates - **Stale element reference** → re-sniffs and re-locates - **OS DPI scaling mismatch** → normalizes via `GetDpiForWindow` - **Locked workstation** → checks `GetForegroundWindow`, refuses to run ## Pricing Single-purchase, lifetime access. Includes: - All 6 helper scripts - 3 reference docs (locator strategies, Citrix gotchas, UIA cheatsheet) - 5 example workflows - Future updates for the same major version ## Support GitHub-style issues aren't open for marketplace items. For help, contact via the creator's page — typically a 48-hour response window. ## Compatibility Works with any agent that supports the SKILL.md standard and can execute Python: Claude Code, OpenClaw, Codex CLI, Cursor, Gemini CLI, Cline, Windsurf, Aider. Tested on Windows 10/11, Server 2019+. ## Tags windows, automation, desktop, ui-automation, pywinauto, ocr, rpa, devops, citrix, rdp, legacy

    Use Cases

    • Automate any Windows desktop app — UI scraping, hotkey macros, window control, OCR-driven click flows, and resilient error recovery. Works with Citrix/terminals/VB6/MFC apps that have no API.

    Reviews

    No reviews yet - be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    Terminal / Shell
    Browser

    File Scopes

    references/**
    scripts/**

    Works with any agent that supports the universal SKILL.md standard

    Creator

    Frequently Asked Questions

    More Premium Skills

    Free