2

    web-crawler

    by Roy Yuen

    Ethical, structured web crawling and data extraction planning with built-in pagination and deduplication logic.

    Updated Apr 2026
    Security scanned
    One-time purchase

    $5

    One-time purchase · Own forever

    Included in download

    • Create structured JSON catalogs from public e-commerce product pages.
    • Design stable pagination strategies for complex, multi-page web directories.
    • Includes example output and usage patterns
    • Instant install
    • One-time purchase

    See it in action

    Target: example-store.com/products
    Fields: [price, sku, availability]
    Scope: First 5 pages (approx 100 items)
    Pagination: Incremental URL params (?page=n)
    Output: JSON array
    Risks: Detected anti-bot headers; conservative delay set to 2s between requests.

    About This Skill

    Professional-Grade Web Data Extraction

    The Web Crawler skill provides a structured, ethical framework for extracting data from public websites. Navigating modern web architectures requires more than a simple fetch request; this skill handles the heavy lifting of crawl planning, pagination strategy, and data schema definition.

    What it does

    Designed for developers and researchers, this skill automates the logic behind gathering structured information. It manages:

    • Crawl Planning: Defining targets, depth, and page limits.
    • Extraction Rules: Mapping CSS/DOM elements to structured fields like JSON or CSV.
    • Pagination Logic: Strategies for handling "Next" buttons and infinite scrolls.
    • Ethical Guardrails: Built-in checks for robots.txt compliance and rate limiting.

    Why use this skill?

    Prompting an AI to "scrape this site" often leads to messy outputs or blocked requests. This skill implements a rigorous workflow that ensures your data is clean, deduplicated, and extracted responsibly. It separates the crawl scope from post-processing, making your data pipelines more stable and reproducible. It's compatible with tools like Claude Code, Codex, and OpenCode, serving as a sophisticated supervisor for your web automation tasks.

    Use Cases

    • Create structured JSON catalogs from public e-commerce product pages.
    • Automate recursive link discovery and content extraction for research.
    • Design stable pagination strategies for complex, multi-page web directories.
    • Implement ethical crawling guardrails to respect robots.txt and site health.

    Reviews

    No reviews yet — be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    No special permissions declared or detected

    Creator

    Frequently Asked Questions

    Similar Skills

    $5

    One-time