1

    data-diff

    by Ikerg

    Structural and cell-level diffing for CSV/Excel with schema drift detection and CI-ready exit codes.

    Updated Jun 2026
    Security scanned

    $5

    · or 25 credits

    30-day refund guarantee

    Secure checkout via Stripe

    Included in download

    • Identify schema drift and column type changes before they break ETL pipelines.
    • Validate database migrations by comparing legacy vs. new system exports.
    • terminal automation included
    • Instant install

    Sample input

    Compare the inventory_v1.csv and inventory_v2.csv files using 'sku_id' as the key and tell me what changed in the pricing column.

    Sample output

    Schema: Identical. Rows: 154 unchanged, 3 added, 1 removed, 12 changed. Column 'unit_price' drifted in 12 rows. Sample: SKU_992: 45.00 -> 49.99 Exit code 1: Data regression detected.

    About This Skill

    Automated Data Comparison for CSV and Excel

    Stop manually scrolling through spreadsheets to find changes. This developer-centric skill provides a precise, programmatic way to compare two versions of a dataset. It goes beyond simple file diffing by analyzing schema drift and row-level transformations using keyed or hash-based matching.

    What it does

    • Schema Drift Detection: Identifies added, removed, or renamed columns and data type changes (dtypes) that break downstream pipelines.
    • Row-Level Analysis: Categorizes rows as added, removed, changed, or unchanged. When a primary key is provided, it isolates exactly which cells changed with before → after samples.
    • Visual Reporting: Optionally generates interactive Plotly HTML reports to help non-technical stakeholders visualize data drift.
    • CI/CD Integration: Uses standard exit codes (0 for identical, 1 for differences), making it an ideal gate for automated data pipeline regression tests.

    Why use this skill?

    Standard text diffing tools fail on CSVs because row ordering or insignificant formatting changes trigger false positives. This skill understands data structures. It handles NaN values correctly (NaN == NaN) and provides statistical summaries of which columns are the "hottest" (most frequently changed), allowing you to pinpoint logic errors in your ETL scripts instantly.

    Supported Formats & Tools

    • Files: .csv, .xlsx, .xls
    • Frameworks: Built on Pandas, OpenPyXL, and Plotly.
    • Features: Composite keys, specific sheet targeting, and machine-readable JSON output.

    Use Cases

    • Identify schema drift and column type changes before they break ETL pipelines.
    • Validate database migrations by comparing legacy vs. new system exports.
    • Perform regression testing on data pipeline outputs during refactoring.
    • Generate interactive HTML reports for business stakeholders to review data changes.
    • Detect silent data loss by tracking removed rows in append-only datasets.

    Reviews

    No reviews yet - be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    Terminal / Shell

    File Scopes

    scripts/**
    examples/**

    Creator

    Lead Data Engineer with 11 years of experience designing and delivering scalable data platforms across Databricks, AWS, and Azure ecosystems. Proven track record of building high-performance data solutions for large-scale, data-intensive organizations in industries including healthcare and robotics. Extensive experience working in highly regulated environments, managing complex data pipelines and large volumes of structured and unstructured data.

    Frequently Asked Questions

    More Premium Skills