Windows Desktop Automation v2
by Kaymue
Automate any Windows app with UI Automation, OCR, and hotkeys. Drive legacy/Citrix/RDP UIs. Python scripts for pywinauto + OpenCV.
- Automate any Windows desktop app — UI scraping, hotkey macros, window control, OCR-driven click flows, and resilient error recovery. Works with Citrix/terminals/VB6/MFC apps that have no API.
Free
Windows Desktop Automation v2
by Kaymue
Automate any Windows app with UI Automation, OCR, and hotkeys. Drive legacy/Citrix/RDP UIs. Python scripts for pywinauto + OpenCV.
Free
Included in download
- Downloadable skill package
- 2 permissions declared
- Instant install
About This Skill
# Windows Desktop Automation End-to-end Windows desktop automation for AI coding agents. Drive legacy applications, terminal emulators, Citrix windows, and GUI-only tools with the same rigor you apply to REST APIs. ## What it does This skill gives your AI agent a structured workflow for automating **any** Windows desktop application — even those with no CLI, no API, and no documented automation surface. It combines four proven techniques into one resilient stack: - **UI Automation (pywinauto + UIA)** for native Win32, WPF, WinForms, and Qt apps - **Hotkey/Macro layer** for keyboard-shortcut-driven workflows - **Image recognition (OpenCV + Tesseract OCR)** for Citrix, RDP, custom-drawn, and remote-desktop UIs - **Resilience patterns** — retry, recovery, evidence capture, and human-in-the-loop fallback It enforces a **"sniff → classify → select → execute → verify"** cycle before any click or keypress, dramatically reducing the silent-failure problem that kills most RPA scripts. ## When to use it - The target app has **no API, no CLI, and no automation hooks** - You need to drive **Citrix, RDP, terminal emulators, or VB6/MFC legacy apps** - Selenium/Playwright can't help (the UI is not in a browser) - You need **screen-scraping + form-filling** with audit trail - A human currently does the task and you want to hand it off to an agent ## Why it's better than ad-hoc prompting Most LLMs will try to `pyautogui.click(540, 320)` once and call it done. This skill is different: - It **inspects the window tree first** (`Inspect.exe`, `uiautomation`, `pywinauto`) before any action - It **classifies the target control** (Edit, Button, List, Tree, etc.) and picks the right interaction mode - It uses **multiple locator strategies** in parallel (UIA, name, class, image hash, OCR) and falls back gracefully - It **captures screenshots before/after every step** for postmortem - It **never trusts a click** — verifies state change with a follow-up read ## Architecture ``` ┌──────────────────────────────────────────────────────────┐ │ Agent (Claude/Cursor) │ │ - Translates intent into action plan │ │ - Calls helper scripts via Bash │ └───────────────┬──────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────┐ │ skills/windows-desktop-automation/ │ │ scripts/ │ │ ├── sniff_window.py # UIA + window tree dump │ │ ├── classify_target.py # Control type + locator plan │ │ ├── act.py # Click/type/scroll + verify │ │ ├── ocr_click.py # Image+OCR fallback │ │ ├── hotkey.py # Global hotkey macros │ │ └── evidence.py # Screenshot + log capture │ │ references/ │ │ ├── locator-strategies.md │ │ ├── citrix-rdp-gotchas.md │ │ └── uia-cheatsheet.md │ └──────────────────────────────────────────────────────────┘ ``` ## Quick start ```bash # 1. Install pip install pywinauto opencv-python pytesseract pillow psutil # 2. Sniff a window to discover its controls python scripts/sniff_window.py --title "Order Entry" # 3. Drive an action python scripts/act.py --window "Order Entry" \ --action type --control "Customer ID" --value "ACME-001" # 4. If UIA fails, fall back to image+OCR python scripts/ocr_click.py --window "Order Entry" \ --template templates/submit_button.png ``` ## Supported scenarios | Scenario | Tool | Reliability | |----------|------|-------------| | Native Win32/WPF/WinForms | pywinauto UIA backend | ★★★★★ | | Qt apps | pywinauto + UIA | ★★★★ | | Java Swing | pywinauto + JAB | ★★★★ | | Citrix / RDP | OCR + image templates | ★★★ | | Custom-drawn legacy | OCR + image templates | ★★ | | Browser-as-UI | Playwright (use that instead) | ★★★★★ | ## Installation ### Claude Code / OpenClaw / Codex CLI ```bash npx agensi install windows-desktop-automation ``` ### Manual Download `windows-desktop-automation.zip` and unzip to `~/.claude/skills/` (or your agent's skills directory). ## Example usage > "Open SAP GUI, navigate to transaction VA01, create a sales order for customer ACME-001, and dump the resulting order number to a JSON file." The skill will: 1. Launch SAP GUI and wait for the main window 2. Sniff the tree, locate the transaction code field 3. Type "VA01", press Enter, wait for the new screen 4. Re-sniff, locate the Order Type field 5. Drive the multi-step form with locator+image fallback 6. Capture the resulting order number from the status bar 7. Write JSON + screenshots to `./evidence/` ## Common failure modes (and how this skill handles them) - **Window not found** → polls every 500ms up to 30s, then OCR fallback - **Control not enabled** → waits + retries, then captures evidence - **Modal dialog appears** → auto-detects via window-tree diff, dismisses or escalates - **Stale element reference** → re-sniffs and re-locates - **OS DPI scaling mismatch** → normalizes via `GetDpiForWindow` - **Locked workstation** → checks `GetForegroundWindow`, refuses to run ## Pricing Single-purchase, lifetime access. Includes: - All 6 helper scripts - 3 reference docs (locator strategies, Citrix gotchas, UIA cheatsheet) - 5 example workflows - Future updates for the same major version ## Support GitHub-style issues aren't open for marketplace items. For help, contact via the creator's page — typically a 48-hour response window. ## Compatibility Works with any agent that supports the SKILL.md standard and can execute Python: Claude Code, OpenClaw, Codex CLI, Cursor, Gemini CLI, Cline, Windsurf, Aider. Tested on Windows 10/11, Server 2019+. ## Tags windows, automation, desktop, ui-automation, pywinauto, ocr, rpa, devops, citrix, rdp, legacy
Use Cases
- Automate any Windows desktop app — UI scraping, hotkey macros, window control, OCR-driven click flows, and resilient error recovery. Works with Citrix/terminals/VB6/MFC apps that have no API.
How to Install
mkdir -p ~/.claude/skills && curl -sL https://www.agensi.io/api/install/windows-desktop-automation-v2 -o /tmp/windows-desktop-automation-v2.zip && unzip -o /tmp/windows-desktop-automation-v2.zip -d ~/.claude/skills && rm /tmp/windows-desktop-automation-v2.zipFree skills install directly. Paid skills require purchase - use the download button above after buying.
Reviews
No reviews yet - be the first to share your experience.
Only users who have downloaded or purchased this skill can leave a review.
No reviews yet - be the first to share your experience.
Only users who have downloaded or purchased this skill can leave a review.
Security Scanned
Passed automated security review
Permissions
File Scopes
Works with any agent that supports the universal SKILL.md standard
Creator
Frequently Asked Questions
Learn More About AI Agent Skills
More Premium Skills
Multi-Agent Orchestration Master Library
Transform Claude Code into a coordinated multi-agent system. Battle-tested tmux orchestration patterns, YAML task queues, event-driven communication, and parallel worker management for 8+ agents.
cinematic-sites
Turn any basic business URL into a high-end cinematic landing page with AI-generated 4K assets and GSAP animations.
endless-loop
Autonomous research and task loop that builds on previous findings to solve complex objectives while you sleep.
skill-router-2
Automatically detect, load, and stack the perfect skills combo for any user request.