
Windows Desktop Automation Architect for AI Coding Agents
Designs robust Windows desktop automation workflows using pywinauto, UI Automation, hotkeys, image matching, OCR, retries, logging, screenshots, and safety controls.
- Convert manual Windows clicks into resilient pywinauto selector-based scripts.
- Add screenshot diagnostics and retry logic to brittle desktop bots.
- Automate legacy business software that lacks modern APIs or web interfaces.
Secure checkout via Stripe
Included in download
- Convert manual Windows clicks into resilient pywinauto selector-based scripts.
- Add screenshot diagnostics and retry logic to brittle desktop bots.
- terminal, file_write, file_read automation included
- Ready for Compatible with ChatGPT Custom GPTs
Sample Output
A real example of what this skill produces.
=== WINDOWS DESKTOP AUTOMATION ARCHITECTURE PLAN ===
Original request: Create a pywinauto automation architecture plan and AI coding agent prompt for a Windows desktop app that exports a monthly sales report.
Interpreted automation goal: Automate a Windows desktop workflow that opens the target app, navigates to Reports, selects a monthly date range, exports a sales report as an Excel file, handles the Save As dialog, and verifies the exported file.
Target application: Unspecified Windows desktop application
Workflow type: Report export automation
Assumptions:
- The user is authorized to automate this app.
- The app has a visible Reports section.
- The export creates an Excel file.
- pywinauto may be able to inspect the app through UIA or Win32.
- A standard Windows Save As dialog may appear during export.
Unknowns to clarify:
- application executable name
- app technology: Win32, WPF, WinForms, Electron, Java, Qt, Citrix, or unknown
- whether login is required
- output folder
- whether report controls expose stable properties
- whether date controls are accessible
- DPI scaling and screen resolution
- whether the workflow runs locally or through Remote Desktop/Citrix
Better alternatives to GUI automation: Before GUI automation, check whether the app provides an API, scheduled export, command-line export, database report, built-in report scheduler, or file-based export configuration. If no reliable non-GUI method exists, desktop automation is appropriate.
Recommended automation strategy: Use pywinauto UI Automation first if controls are visible in the accessibility tree. Test the Win32 backend if UIA does not expose stable controls. Use stable selectors for the main window, report navigation, date fields, export button, and Save As dialog. Use hotkeys only after focus verification. Use image matching for inaccessible custom buttons. Use OCR only for status text or confirmation text when control text is unavailable.
Workflow map:
- Launch or connect to the application.
- Wait for the main window to exist and become visible.
- Handle login or stop for secure manual login if required.
- Navigate to Reports.
- Select the monthly sales report.
- Set the start and end date fields.
- Verify the date range was accepted.
- Trigger Export.
- Wait for the Save As dialog.
- Save the file to the configured output folder.
- Wait for export completion.
- Verify the Excel file exists and file size is greater than zero.
- Log success and final output path.
Selector strategy:
- Use process name or executable path for connection.
- Use main window title regex instead of exact volatile titles.
- Use automation ID, control type, class name, or stable text for controls.
- Use parent-child hierarchy when needed.
- Use standard Windows dialog selectors for Save As.
- Avoid raw coordinates unless no selector, hotkey, image, or OCR alternative exists.
Hotkey strategy: Use hotkeys only when:
- the active window is verified
- focus is known
- the shortcut is stable
- the resulting state is verified
Possible fallback hotkeys:
- documented Alt menu shortcuts
- Ctrl+S only if it reliably triggers export/save
- Tab navigation only with state verification, not blind repeated sequences
Image matching strategy: Use image matching only for inaccessible custom controls. Requirements:
- restrict search to toolbar or report panel regions
- define match threshold
- capture templates at the same DPI and theme
- validate expected state after clicking
- capture screenshot if image target is not found
OCR strategy: Use OCR only for:
- export status text
- confirmation messages
- error banners
- report title verification if control text is unavailable
OCR requirements:
- restrict OCR to the status/message region
- define expected phrases such as Export complete or Report generated
- use confidence thresholds
- avoid capturing sensitive report content unless approved
Waits and synchronization:
- wait for main window visible
- wait for Reports screen loaded
- wait for date fields enabled
- wait for Export button enabled
- wait for Save As dialog
- wait for output file creation
- wait for file size to stabilize
- timeout each step with clear error messages
Error handling and recovery:
- if app is not running, launch it
- if wrong window is active, refocus the main window
- if login is required, stop or request secure manual login
- if a control is not found, retry and capture screenshot
- if an unexpected popup appears, classify and stop or handle known popup
- if export times out, retry once
- if file already exists, apply overwrite policy
- if OCR confidence is low, request manual confirmation
- if failure persists, abort safely with logs
Logging and diagnostics: Log:
- timestamp
- workflow step
- selector used
- active window title
- success or failure
- elapsed time
- retry count
- output path
- screenshot path on failure
Do not log:
- passwords
- tokens
- secret values
- sensitive report data
- full screenshots containing private data unless approved
Safety controls:
- dry-run mode up to the Export action
- configurable output folder
- overwrite protection
- maximum retry limit
- user confirmation before overwriting files
- stop-on-error behavior
- screenshot logging can be disabled for sensitive environments
Implementation structure: automation_project/ main.py config.py app_connection.py selectors.py workflow_steps.py image_matching.py ocr_utils.py logging_utils.py safety.py recovery.py tests/ logs/ screenshots/ README.md
AI coding agent prompt: Build a robust Windows desktop automation script in Python using pywinauto for exporting a monthly sales report from a desktop application. First determine whether the UIA or Win32 backend is more appropriate. Prefer stable selectors over coordinates. Create modular files for app connection, selectors, workflow steps, logging, recovery, image matching fallback, OCR fallback, safety controls, and configuration. Add explicit waits for windows, controls, enabled states, dialogs, export completion, and file creation. Use hotkeys only when the active window and resulting state can be verified. Use image matching only for inaccessible custom controls and OCR only for status verification when control text is unavailable. Add structured logs and screenshots on failure while avoiding sensitive data. Include dry-run mode, overwrite protection, retry limits, and manual confirmation for risky steps. Return automation strategy, selectors used, fallback strategy, how to run the script, testing plan, and known limitations.
Testing plan:
- app closed before run
- app already open before run
- wrong window active
- slow Reports screen loading
- Save As dialog delayed
- output file already exists
- export succeeds
- export times out
- unexpected popup appears
- different DPI scaling
- screenshot logging disabled
- dry-run mode enabled
- Remote Desktop or Citrix environment if applicable
Known limitations: Reliability depends on target app technology, control accessibility, app version, Windows permissions, DPI scaling, resolution, language, theme, timing, and local versus remote execution.
Verification checklist:
- app launches or connects successfully
- main window is detected
- Reports screen is reached
- date range is set correctly
- Export action is triggered only after state verification
- Save As dialog is handled
- Excel file exists
- file size is greater than zero
- logs are created
- screenshot captured on failure when allowed
- no credentials or sensitive report data are logged

Windows Desktop Automation Architect for AI Coding Agents
Designs robust Windows desktop automation workflows using pywinauto, UI Automation, hotkeys, image matching, OCR, retries, logging, screenshots, and safety controls.
Secure checkout via Stripe
Included in download
- Convert manual Windows clicks into resilient pywinauto selector-based scripts.
- Add screenshot diagnostics and retry logic to brittle desktop bots.
- terminal, file_write, file_read automation included
- Ready for Compatible with ChatGPT Custom GPTs
- Instant install
Sample Output
A real example of what this skill produces.
=== WINDOWS DESKTOP AUTOMATION ARCHITECTURE PLAN ===
Original request: Create a pywinauto automation architecture plan and AI coding agent prompt for a Windows desktop app that exports a monthly sales report.
Interpreted automation goal: Automate a Windows desktop workflow that opens the target app, navigates to Reports, selects a monthly date range, exports a sales report as an Excel file, handles the Save As dialog, and verifies the exported file.
Target application: Unspecified Windows desktop application
Workflow type: Report export automation
Assumptions:
- The user is authorized to automate this app.
- The app has a visible Reports section.
- The export creates an Excel file.
- pywinauto may be able to inspect the app through UIA or Win32.
- A standard Windows Save As dialog may appear during export.
Unknowns to clarify:
- application executable name
- app technology: Win32, WPF, WinForms, Electron, Java, Qt, Citrix, or unknown
- whether login is required
- output folder
- whether report controls expose stable properties
- whether date controls are accessible
- DPI scaling and screen resolution
- whether the workflow runs locally or through Remote Desktop/Citrix
Better alternatives to GUI automation: Before GUI automation, check whether the app provides an API, scheduled export, command-line export, database report, built-in report scheduler, or file-based export configuration. If no reliable non-GUI method exists, desktop automation is appropriate.
Recommended automation strategy: Use pywinauto UI Automation first if controls are visible in the accessibility tree. Test the Win32 backend if UIA does not expose stable controls. Use stable selectors for the main window, report navigation, date fields, export button, and Save As dialog. Use hotkeys only after focus verification. Use image matching for inaccessible custom buttons. Use OCR only for status text or confirmation text when control text is unavailable.
Workflow map:
- Launch or connect to the application.
- Wait for the main window to exist and become visible.
- Handle login or stop for secure manual login if required.
- Navigate to Reports.
- Select the monthly sales report.
- Set the start and end date fields.
- Verify the date range was accepted.
- Trigger Export.
- Wait for the Save As dialog.
- Save the file to the configured output folder.
- Wait for export completion.
- Verify the Excel file exists and file size is greater than zero.
- Log success and final output path.
Selector strategy:
- Use process name or executable path for connection.
- Use main window title regex instead of exact volatile titles.
- Use automation ID, control type, class name, or stable text for controls.
- Use parent-child hierarchy when needed.
- Use standard Windows dialog selectors for Save As.
- Avoid raw coordinates unless no selector, hotkey, image, or OCR alternative exists.
Hotkey strategy: Use hotkeys only when:
- the active window is verified
- focus is known
- the shortcut is stable
- the resulting state is verified
Possible fallback hotkeys:
- documented Alt menu shortcuts
- Ctrl+S only if it reliably triggers export/save
- Tab navigation only with state verification, not blind repeated sequences
Image matching strategy: Use image matching only for inaccessible custom controls. Requirements:
- restrict search to toolbar or report panel regions
- define match threshold
- capture templates at the same DPI and theme
- validate expected state after clicking
- capture screenshot if image target is not found
OCR strategy: Use OCR only for:
- export status text
- confirmation messages
- error banners
- report title verification if control text is unavailable
OCR requirements:
- restrict OCR to the status/message region
- define expected phrases such as Export complete or Report generated
- use confidence thresholds
- avoid capturing sensitive report content unless approved
Waits and synchronization:
- wait for main window visible
- wait for Reports screen loaded
- wait for date fields enabled
- wait for Export button enabled
- wait for Save As dialog
- wait for output file creation
- wait for file size to stabilize
- timeout each step with clear error messages
Error handling and recovery:
- if app is not running, launch it
- if wrong window is active, refocus the main window
- if login is required, stop or request secure manual login
- if a control is not found, retry and capture screenshot
- if an unexpected popup appears, classify and stop or handle known popup
- if export times out, retry once
- if file already exists, apply overwrite policy
- if OCR confidence is low, request manual confirmation
- if failure persists, abort safely with logs
Logging and diagnostics: Log:
- timestamp
- workflow step
- selector used
- active window title
- success or failure
- elapsed time
- retry count
- output path
- screenshot path on failure
Do not log:
- passwords
- tokens
- secret values
- sensitive report data
- full screenshots containing private data unless approved
Safety controls:
- dry-run mode up to the Export action
- configurable output folder
- overwrite protection
- maximum retry limit
- user confirmation before overwriting files
- stop-on-error behavior
- screenshot logging can be disabled for sensitive environments
Implementation structure: automation_project/ main.py config.py app_connection.py selectors.py workflow_steps.py image_matching.py ocr_utils.py logging_utils.py safety.py recovery.py tests/ logs/ screenshots/ README.md
AI coding agent prompt: Build a robust Windows desktop automation script in Python using pywinauto for exporting a monthly sales report from a desktop application. First determine whether the UIA or Win32 backend is more appropriate. Prefer stable selectors over coordinates. Create modular files for app connection, selectors, workflow steps, logging, recovery, image matching fallback, OCR fallback, safety controls, and configuration. Add explicit waits for windows, controls, enabled states, dialogs, export completion, and file creation. Use hotkeys only when the active window and resulting state can be verified. Use image matching only for inaccessible custom controls and OCR only for status verification when control text is unavailable. Add structured logs and screenshots on failure while avoiding sensitive data. Include dry-run mode, overwrite protection, retry limits, and manual confirmation for risky steps. Return automation strategy, selectors used, fallback strategy, how to run the script, testing plan, and known limitations.
Testing plan:
- app closed before run
- app already open before run
- wrong window active
- slow Reports screen loading
- Save As dialog delayed
- output file already exists
- export succeeds
- export times out
- unexpected popup appears
- different DPI scaling
- screenshot logging disabled
- dry-run mode enabled
- Remote Desktop or Citrix environment if applicable
Known limitations: Reliability depends on target app technology, control accessibility, app version, Windows permissions, DPI scaling, resolution, language, theme, timing, and local versus remote execution.
Verification checklist:
- app launches or connects successfully
- main window is detected
- Reports screen is reached
- date range is set correctly
- Export action is triggered only after state verification
- Save As dialog is handled
- Excel file exists
- file size is greater than zero
- logs are created
- screenshot captured on failure when allowed
- no credentials or sensitive report data are logged
About This Skill
Windows Desktop Automation Architect helps AI coding agents, developers, automation builders, QA teams, operations teams, and business users design reliable Windows GUI automation for desktop applications that do not have easy APIs. It creates automation architecture plans, pywinauto prompts, manual workflow conversion specs, selector strategies, hotkey plans, image matching fallbacks, OCR fallback plans, retry and timeout strategies, logging and screenshot diagnostics, safety checklists, debugging prompts, and test plans. The skill is ideal for automating legacy Windows apps, ERP/CRM tools, report exports, file dialogs, data entry workflows, installer screens, remote desktop workflows, and repetitive internal business processes while avoiding brittle coordinate-only scripts.
📖 Learn more: Best Frontend & Design Skills for Claude Code →
Use Cases
- Convert manual Windows clicks into resilient pywinauto selector-based scripts.
- Add screenshot diagnostics and retry logic to brittle desktop bots.
- Automate legacy business software that lacks modern APIs or web interfaces.
- Design OCR and image-matching fallbacks for Citrix or Remote Desktop sessions.
- Create safety-first workflows for sensitive data entry or file exports.
- Audit coordinate based scripts and make them more reliable
- Create Cursor prompts for robust Windows automation projects
- Plan Citrix or Remote Desktop automation with OCR and image matching
- Automate Save As dialogs and file export workflows
- Build safety checklists for destructive desktop workflows
- Add logging screenshots retries and recovery to automation scripts
Known Limitations
This skill creates Windows desktop automation plans, prompts, reliability strategies, and test checklists, but it does not guarantee that an automation will work reliably on every machine, app version, DPI setting, resolution, language, theme, permission level, Remote Desktop session, Citrix environment, or Windows configuration. GUI automation can be fragile when applications change, controls are inaccessible, timing varies, OCR confidence is low, image templates drift, or workflows involve sensitive or destructive actions. Production use requires real-environment testing, logging, recovery paths, user authorization, and human review for high-risk workflows.
How to Install
mkdir -p ~/.claude/skills && curl -sL https://www.agensi.io/api/install/windows-desktop-automation-architect-for-ai-coding-agents | tar xz -C ~/.claude/skills/Free skills install directly. Paid skills require purchase - use the download button above after buying.
Reviews
No reviews yet - be the first to share your experience.
Only users who have downloaded or purchased this skill can leave a review.
Early access skill
Be the first to review this skill.
Only users who have downloaded or purchased this skill can leave a review.
Security Scanned
Passed automated security review
Permissions
Allowed Hosts
File Scopes
This skill uses file access to read user-provided automation scripts, workflow notes, logs, selector dumps, screenshot descriptions, configuration examples, README files, and project files. It uses write access to create structured Markdown/text outputs such as Windows automation architecture plans, pywinauto implementation prompts, manual workflow conversion specs, image matching and OCR plans, reliability audits, safety checklists, debugging prompts, testing plans, documentation, and SKILL.md files. Terminal access is optional and should only be enabled when the agent is expected to inspect or test local automation scripts. Browser or network access is only needed for external documentation research. Environment variable access is not normally required, and secret values should never be exposed.
Tags
Compatible with ChatGPT Custom GPTs, ChatGPT Agents, Cursor, Claude Code, Codex CLI, OpenCode, Replit, and other AI coding agent workflows that support structured Markdown instruction files such as SKILL.md. It can also be used manually in any AI chat by pasting the instructions.