data-diff
by Ikerg
Structural and cell-level diffing for CSV/Excel with schema drift detection and CI-ready exit codes.
- Identify schema drift and column type changes before they break ETL pipelines.
- Validate database migrations by comparing legacy vs. new system exports.
- Perform regression testing on data pipeline outputs during refactoring.
$5
· or 25 creditsSecure checkout via Stripe
Included in download
- Identify schema drift and column type changes before they break ETL pipelines.
- Validate database migrations by comparing legacy vs. new system exports.
- terminal automation included
Sample input
Compare the inventory_v1.csv and inventory_v2.csv files using 'sku_id' as the key and tell me what changed in the pricing column.
Sample output
Schema: Identical. Rows: 154 unchanged, 3 added, 1 removed, 12 changed. Column 'unit_price' drifted in 12 rows. Sample: SKU_992: 45.00 -> 49.99 Exit code 1: Data regression detected.
data-diff
by Ikerg
Structural and cell-level diffing for CSV/Excel with schema drift detection and CI-ready exit codes.
$5
· or 25 creditsSecure checkout via Stripe
Included in download
- Identify schema drift and column type changes before they break ETL pipelines.
- Validate database migrations by comparing legacy vs. new system exports.
- terminal automation included
- Instant install
Sample input
Compare the inventory_v1.csv and inventory_v2.csv files using 'sku_id' as the key and tell me what changed in the pricing column.
Sample output
Schema: Identical. Rows: 154 unchanged, 3 added, 1 removed, 12 changed. Column 'unit_price' drifted in 12 rows. Sample: SKU_992: 45.00 -> 49.99 Exit code 1: Data regression detected.
About This Skill
Automated Data Comparison for CSV and Excel
Stop manually scrolling through spreadsheets to find changes. This developer-centric skill provides a precise, programmatic way to compare two versions of a dataset. It goes beyond simple file diffing by analyzing schema drift and row-level transformations using keyed or hash-based matching.
What it does
- Schema Drift Detection: Identifies added, removed, or renamed columns and data type changes (dtypes) that break downstream pipelines.
- Row-Level Analysis: Categorizes rows as added, removed, changed, or unchanged. When a primary key is provided, it isolates exactly which cells changed with before → after samples.
- Visual Reporting: Optionally generates interactive Plotly HTML reports to help non-technical stakeholders visualize data drift.
- CI/CD Integration: Uses standard exit codes (0 for identical, 1 for differences), making it an ideal gate for automated data pipeline regression tests.
Why use this skill?
Standard text diffing tools fail on CSVs because row ordering or insignificant formatting changes trigger false positives. This skill understands data structures. It handles NaN values correctly (NaN == NaN) and provides statistical summaries of which columns are the "hottest" (most frequently changed), allowing you to pinpoint logic errors in your ETL scripts instantly.
Supported Formats & Tools
- Files: .csv, .xlsx, .xls
- Frameworks: Built on Pandas, OpenPyXL, and Plotly.
- Features: Composite keys, specific sheet targeting, and machine-readable JSON output.
Use Cases
- Identify schema drift and column type changes before they break ETL pipelines.
- Validate database migrations by comparing legacy vs. new system exports.
- Perform regression testing on data pipeline outputs during refactoring.
- Generate interactive HTML reports for business stakeholders to review data changes.
- Detect silent data loss by tracking removed rows in append-only datasets.
How to Install
mkdir -p ~/.claude/skills && curl -sL https://www.agensi.io/api/install/data-diff -o /tmp/data-diff.zip && unzip -o /tmp/data-diff.zip -d ~/.claude/skills && rm /tmp/data-diff.zipFree skills install directly. Paid skills require purchase - use the download button above after buying.
Reviews
No reviews yet - be the first to share your experience.
Only users who have downloaded or purchased this skill can leave a review.
Early access skill
Be the first to review this skill.
Only users who have downloaded or purchased this skill can leave a review.
Security Scanned
Passed automated security review
Permissions
File Scopes
Creator
Lead Data Engineer with 11 years of experience designing and delivering scalable data platforms across Databricks, AWS, and Azure ecosystems. Proven track record of building high-performance data solutions for large-scale, data-intensive organizations in industries including healthcare and robotics. Extensive experience working in highly regulated environments, managing complex data pipelines and large volumes of structured and unstructured data.
Frequently Asked Questions
Learn More About AI Agent Skills
More Premium Skills
designing-hybrid-context-layers
Architects the right retrieval strategy for every query — teaching your agent when to use RAG, a knowledge graph, or a temporal index instead of defaulting to vector search for everything.
consumer-motivation-analyzer
Go beyond surface-level feedback to uncover the psychological drivers and hidden motivations behind buyer behavior.
Bounty Security Pattern Master Library — 399 Vulnerability Patterns
A premium library of 399 vulnerability patterns and DeFi attack vectors for AI-driven bug hunting and security audits.
ai-automation-qa-pack
Professional QA & UAT documentation generator for AI automation agencies and complex agent deployments.