1

    Data Cleaning Architect

    Clean messy tabular data with a profile-first workflow, audit logs, and reproducible pandas scripts.

    Updated Jul 2026
    Security scanned

    $9

    · or 45 credits

    30-day refund guarantee

    Secure checkout via Stripe

    Included in download

    • Identify data anomalies and mixed types before running transformations.
    • Standardize inconsistent date, currency, and category formats.
    • Instant install

    Media gallery

    See it in action

    You say

    I have a messy customers.csv with weird date formats, some duplicate names with different emails, and weird characters in the addresses. Clean it up.

    Your agent does

    Profiled customers.csv: Found 3 date formats and 12 fuzzy duplicates. Proposed plan: 1. Normalize dates to ISO 8601. 2. Dedupe names (threshold <2). 3. Fix encoding (mojibake). Generated: customers_cleaned.csv, audit_log.md, and cleaning_script.py. Review 12 flagged duplicates in log.

    About This Skill

    The problem

    Manual data cleaning is error-prone, slow, and often results in silent data loss. Developers frequently waste hours writing one-off scripts to handle inconsistent date formats, encoding issues, and hidden null values without a clear audit trail.

    What it does

    • Generates a comprehensive data profile identifying mixed types, encoding damage, and structural anomalies before any modifications.
    • Executes a disciplined cleaning pipeline including type coercion, whitespace normalization, and ISO 8601 date standardization.
    • Implements a quarantine system for failed coercions rather than silently converting values to null.
    • Flags fuzzy duplicates for manual review instead of performing risky auto-merges.
    • Produces a reproducible Python/pandas script for recurring datasets.

    Frameworks & tools

    Python, Pandas, CSV, Excel (XLSX), JSON.

    Why this beats prompting it yourself

    Standard LLM prompts often hallucinate fixes or silently delete rows they don't understand. This skill enforces a strict profile-first workflow, maintains an audit log of every change, and ensures row-count integrity that basic prompting lacks.

    Use cases

    • Preparing messy CSV exports from legacy systems for database migration.
    • Standardizing disparate date and currency formats for financial analysis.
    • Cleaning marketing lead lists with fuzzy deduplication and email validation.
    • Generating reusable cleaning scripts for monthly reporting pipelines.

    Known limitations

    Requires the user to resolve specific ambiguities, such as ambiguous date formats (MM/DD vs DD/MM), when no clear evidence exists in the data.

    Use Cases

    • Identify data anomalies and mixed types before running transformations.
    • Standardize inconsistent date, currency, and category formats.
    • Quarantine failed row coercions to prevent silent data loss.
    • Generate an audit log documenting every modification and row count delta.
    • Export Python scripts to automate future cleaning of the same data source.

    How to install

    Drop the file into your AI tool. Works with Claude, Cursor, ChatGPT, and 20+ more.

    Reviews

    No reviews yet - be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    No special permissions declared or detected

    Compatible with SKILL.md-compatible agents including Claude Code and Cursor.

    Frequently Asked Questions