Windows Desktop Automation v2

Automate any Windows app with UI Automation, OCR, and hotkeys. Drive legacy/Citrix/RDP UIs. Python scripts for pywinauto + OpenCV.

Updated Jun 2026

Automate any Windows desktop app — UI scraping, hotkey macros, window control, OCR-driven click flows, and resilient error recovery. Works with Citrix/terminals/VB6/MFC apps that have no API.

Security scannedInstant install

Free

Included in download

Downloadable skill package
2 permissions declared

Kaymue

Windows Desktop Automation v2

Name: Windows Desktop Automation v2
Availability: InStock
Author: Agensi

by Kaymue

Automate any Windows app with UI Automation, OCR, and hotkeys. Drive legacy/Citrix/RDP UIs. Python scripts for pywinauto + OpenCV.

Updated Jun 2026

0 installs

Free

⚡ Also available via Agensi MCP - your AI agent can load this skill on demand via MCP. Learn more →

Included in download

Downloadable skill package
2 permissions declared
Instant install

0 installs

Works with any agent that s…

About This Skill

# Windows Desktop Automation End-to-end Windows desktop automation for AI coding agents. Drive legacy applications, terminal emulators, Citrix windows, and GUI-only tools with the same rigor you apply to REST APIs. ## What it does This skill gives your AI agent a structured workflow for automating **any** Windows desktop application — even those with no CLI, no API, and no documented automation surface. It combines four proven techniques into one resilient stack: - **UI Automation (pywinauto + UIA)** for native Win32, WPF, WinForms, and Qt apps - **Hotkey/Macro layer** for keyboard-shortcut-driven workflows - **Image recognition (OpenCV + Tesseract OCR)** for Citrix, RDP, custom-drawn, and remote-desktop UIs - **Resilience patterns** — retry, recovery, evidence capture, and human-in-the-loop fallback It enforces a **"sniff → classify → select → execute → verify"** cycle before any click or keypress, dramatically reducing the silent-failure problem that kills most RPA scripts. ## When to use it - The target app has **no API, no CLI, and no automation hooks** - You need to drive **Citrix, RDP, terminal emulators, or VB6/MFC legacy apps** - Selenium/Playwright can't help (the UI is not in a browser) - You need **screen-scraping + form-filling** with audit trail - A human currently does the task and you want to hand it off to an agent ## Why it's better than ad-hoc prompting Most LLMs will try to `pyautogui.click(540, 320)` once and call it done. This skill is different: - It **inspects the window tree first** (`Inspect.exe`, `uiautomation`, `pywinauto`) before any action - It **classifies the target control** (Edit, Button, List, Tree, etc.) and picks the right interaction mode - It uses **multiple locator strategies** in parallel (UIA, name, class, image hash, OCR) and falls back gracefully - It **captures screenshots before/after every step** for postmortem - It **never trusts a click** — verifies state change with a follow-up read ## Architecture ``` ┌──────────────────────────────────────────────────────────┐ │ Agent (Claude/Cursor) │ │ - Translates intent into action plan │ │ - Calls helper scripts via Bash │ └───────────────┬──────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────┐ │ skills/windows-desktop-automation/ │ │ scripts/ │ │ ├── sniff_window.py # UIA + window tree dump │ │ ├── classify_target.py # Control type + locator plan │ │ ├── act.py # Click/type/scroll + verify │ │ ├── ocr_click.py # Image+OCR fallback │ │ ├── hotkey.py # Global hotkey macros │ │ └── evidence.py # Screenshot + log capture │ │ references/ │ │ ├── locator-strategies.md │ │ ├── citrix-rdp-gotchas.md │ │ └── uia-cheatsheet.md │ └──────────────────────────────────────────────────────────┘ ``` ## Quick start ```bash # 1. Install pip install pywinauto opencv-python pytesseract pillow psutil # 2. Sniff a window to discover its controls python scripts/sniff_window.py --title "Order Entry" # 3. Drive an action python scripts/act.py --window "Order Entry" \ --action type --control "Customer ID" --value "ACME-001" # 4. If UIA fails, fall back to image+OCR python scripts/ocr_click.py --window "Order Entry" \ --template templates/submit_button.png ``` ## Supported scenarios | Scenario | Tool | Reliability | |----------|------|-------------| | Native Win32/WPF/WinForms | pywinauto UIA backend | ★★★★★ | | Qt apps | pywinauto + UIA | ★★★★ | | Java Swing | pywinauto + JAB | ★★★★ | | Citrix / RDP | OCR + image templates | ★★★ | | Custom-drawn legacy | OCR + image templates | ★★ | | Browser-as-UI | Playwright (use that instead) | ★★★★★ | ## Installation ### Claude Code / OpenClaw / Codex CLI ```bash npx agensi install windows-desktop-automation ``` ### Manual Download `windows-desktop-automation.zip` and unzip to `~/.claude/skills/` (or your agent's skills directory). ## Example usage > "Open SAP GUI, navigate to transaction VA01, create a sales order for customer ACME-001, and dump the resulting order number to a JSON file." The skill will: 1. Launch SAP GUI and wait for the main window 2. Sniff the tree, locate the transaction code field 3. Type "VA01", press Enter, wait for the new screen 4. Re-sniff, locate the Order Type field 5. Drive the multi-step form with locator+image fallback 6. Capture the resulting order number from the status bar 7. Write JSON + screenshots to `./evidence/` ## Common failure modes (and how this skill handles them) - **Window not found** → polls every 500ms up to 30s, then OCR fallback - **Control not enabled** → waits + retries, then captures evidence - **Modal dialog appears** → auto-detects via window-tree diff, dismisses or escalates - **Stale element reference** → re-sniffs and re-locates - **OS DPI scaling mismatch** → normalizes via `GetDpiForWindow` - **Locked workstation** → checks `GetForegroundWindow`, refuses to run ## Pricing Single-purchase, lifetime access. Includes: - All 6 helper scripts - 3 reference docs (locator strategies, Citrix gotchas, UIA cheatsheet) - 5 example workflows - Future updates for the same major version ## Support GitHub-style issues aren't open for marketplace items. For help, contact via the creator's page — typically a 48-hour response window. ## Compatibility Works with any agent that supports the SKILL.md standard and can execute Python: Claude Code, OpenClaw, Codex CLI, Cursor, Gemini CLI, Cline, Windsurf, Aider. Tested on Windows 10/11, Server 2019+. ## Tags windows, automation, desktop, ui-automation, pywinauto, ocr, rpa, devops, citrix, rdp, legacy

Use Cases

Automate any Windows desktop app — UI scraping, hotkey macros, window control, OCR-driven click flows, and resilient error recovery. Works with Citrix/terminals/VB6/MFC apps that have no API.

How to Install

mkdir -p ~/.claude/skills && curl -sL https://www.agensi.io/api/install/windows-desktop-automation-v2 -o /tmp/windows-desktop-automation-v2.zip && unzip -o /tmp/windows-desktop-automation-v2.zip -d ~/.claude/skills && rm /tmp/windows-desktop-automation-v2.zip

Free skills install directly. Paid skills require purchase - use the download button above after buying.

Reviews

No reviews yet - be the first to share your experience.

Only users who have downloaded or purchased this skill can leave a review.

No reviews yet - be the first to share your experience.

Only users who have downloaded or purchased this skill can leave a review.

Security Scanned

Passed automated security review

Permissions

Terminal / Shell

Browser

File Scopes

references/**

scripts/**

Creator

Kaymue

Frequently Asked Questions

Learn More About AI Agent Skills

More Premium Skills

Multi-Agent Orchestration Master Library

Transform Claude Code into a coordinated multi-agent system. Battle-tested tmux orchestration patterns, YAML task queues, event-driven communication, and parallel worker management for 8+ agents.

$358 installs

cinematic-sites

Turn any basic business URL into a high-end cinematic landing page with AI-generated 4K assets and GSAP animations.

$128 installs

endless-loop

Autonomous research and task loop that builds on previous findings to solve complex objectives while you sleep.

$125 installs

skill-router-2

Automatically detect, load, and stack the perfect skills combo for any user request.

$53 installs