2

    windows-desktop-automation

    by Zicheng Liao

    High-performance Windows automation using native UI trees, OpenCV image matching, and Tesseract OCR.

    Updated May 2026
    Security scanned
    One-time purchase

    $15

    One-time purchase

    ⚡ Also available via Agensi MCP — your AI agent can load this skill on demand via MCP. Learn more →

    Included in download

    • Extract text from non-selectable UI regions using high-accuracy OCR
    • Perform multi-step GUI testing with native element identification
    • terminal, browser automation included
    • Includes example output and usage patterns
    • Instant install

    See it in action

    Main Window 'Notepad' found (PID: 8422).
    UI Element 'Edit' focus: OK.
    Action: Typed 'Automation Report' via UIA.
    Action: Menu select 'File -> Save As' successful.
    OCR Verify: Found text 'Save As' at (450, 320) with 94% confidence.
    Task completed in 1.4s.

    About This Skill

    What it does

    This skill provides a high-performance automation suite for Windows desktop applications. It leverages the native UI Automation (UIA) backend via pywinauto to interact directly with an application's accessibility tree, making it significantly faster and more reliable than traditional pixel-scanning tools.

    Why use this skill

    Unlike basic macro recorders, this developer-centric skill handles the complex reality of desktop automation. It provides a layered reliability model: if native UI elements are hidden, it falls back to OpenCV image matching; if text is non-selectable, it uses Tesseract OCR. This ensures your automations don't break when a window moves by a few pixels or a UI theme changes.

    Supported Tools & Frameworks

    • pywinauto: Native control interaction (buttons, menus, tree views, datagrids).
    • OpenCV: Computer vision for custom-drawn interfaces and games.
    • pytesseract: Optical Character Recognition for screen text extraction.
    • pyautogui: Global hotkeys, mouse movement, and low-level input.

    The Output

    Expect robust execution of desktop tasks. Instead of fragile coordinate-based clicks, the skill generates scripts that wait for specific UI states, interact with elements by their internal IDs, and handle window focus automatically. The result is "set and forget" automation for legacy software, ERP systems, and desktop utilities.

    Use Cases

    • Automate legacy Windows apps that lack APIs or web interfaces
    • Extract text from non-selectable UI regions using high-accuracy OCR
    • Perform multi-step GUI testing with native element identification
    • Create reliable hotkey macros and automated data entry workflows
    • Interact with custom-drawn controls using OpenCV template matching

    Reviews

    No reviews yet — be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    Terminal / Shell
    Browser

    Allowed Hosts

    github.com

    File Scopes

    windows-desktop-automation/**

    Frequently Asked Questions

    Similar Skills

    $15

    One-time