Data Cleaning Architect
Clean messy tabular data with a profile-first workflow, audit logs, and reproducible pandas scripts.
- Identify data anomalies and mixed types before running transformations.
- Standardize inconsistent date, currency, and category formats.
- Quarantine failed row coercions to prevent silent data loss.
$9
· or 45 creditsSecure checkout via Stripe
Included in download
- Identify data anomalies and mixed types before running transformations.
- Standardize inconsistent date, currency, and category formats.
Media gallery
See it in action
You say
I have a messy customers.csv with weird date formats, some duplicate names with different emails, and weird characters in the addresses. Clean it up.
Your agent does
Profiled customers.csv: Found 3 date formats and 12 fuzzy duplicates. Proposed plan: 1. Normalize dates to ISO 8601. 2. Dedupe names (threshold <2). 3. Fix encoding (mojibake). Generated: customers_cleaned.csv, audit_log.md, and cleaning_script.py. Review 12 flagged duplicates in log.
Data Cleaning Architect
Clean messy tabular data with a profile-first workflow, audit logs, and reproducible pandas scripts.
$9
· or 45 creditsSecure checkout via Stripe
Included in download
- Identify data anomalies and mixed types before running transformations.
- Standardize inconsistent date, currency, and category formats.
- Instant install
Media gallery
See it in action
You say
I have a messy customers.csv with weird date formats, some duplicate names with different emails, and weird characters in the addresses. Clean it up.
Your agent does
Profiled customers.csv: Found 3 date formats and 12 fuzzy duplicates. Proposed plan: 1. Normalize dates to ISO 8601. 2. Dedupe names (threshold <2). 3. Fix encoding (mojibake). Generated: customers_cleaned.csv, audit_log.md, and cleaning_script.py. Review 12 flagged duplicates in log.
About This Skill
The problem
Manual data cleaning is error-prone, slow, and often results in silent data loss. Developers frequently waste hours writing one-off scripts to handle inconsistent date formats, encoding issues, and hidden null values without a clear audit trail.
What it does
- Generates a comprehensive data profile identifying mixed types, encoding damage, and structural anomalies before any modifications.
- Executes a disciplined cleaning pipeline including type coercion, whitespace normalization, and ISO 8601 date standardization.
- Implements a quarantine system for failed coercions rather than silently converting values to null.
- Flags fuzzy duplicates for manual review instead of performing risky auto-merges.
- Produces a reproducible Python/pandas script for recurring datasets.
Frameworks & tools
Python, Pandas, CSV, Excel (XLSX), JSON.
Why this beats prompting it yourself
Standard LLM prompts often hallucinate fixes or silently delete rows they don't understand. This skill enforces a strict profile-first workflow, maintains an audit log of every change, and ensures row-count integrity that basic prompting lacks.
Use cases
- Preparing messy CSV exports from legacy systems for database migration.
- Standardizing disparate date and currency formats for financial analysis.
- Cleaning marketing lead lists with fuzzy deduplication and email validation.
- Generating reusable cleaning scripts for monthly reporting pipelines.
Known limitations
Requires the user to resolve specific ambiguities, such as ambiguous date formats (MM/DD vs DD/MM), when no clear evidence exists in the data.
Use Cases
- Identify data anomalies and mixed types before running transformations.
- Standardize inconsistent date, currency, and category formats.
- Quarantine failed row coercions to prevent silent data loss.
- Generate an audit log documenting every modification and row count delta.
- Export Python scripts to automate future cleaning of the same data source.
Known Limitations
- Requires user input for ambiguous date formats. - Not designed for unstructured text blobs or image data. - Large datasets may require local environment execution.
How to install
Drop the file into your AI tool. Works with Claude, Cursor, ChatGPT, and 20+ more.
Reviews
No reviews yet - be the first to share your experience.
Only users who have downloaded or purchased this skill can leave a review.
Early access skill
Be the first to review this skill.
Only users who have downloaded or purchased this skill can leave a review.
Security Scanned
Passed automated security review
Permissions
No special permissions declared or detected
Compatible with SKILL.md-compatible agents including Claude Code and Cursor.