2

    databricks-auditor

    Deterministic AWS Databricks cost auditor that finds waste in compute, Delta tables, and PySpark code with ROI estimates.

    Updated Jun 2026
    Security scanned

    $29

    · or 145 credits

    30-day refund guarantee

    Secure checkout via Stripe

    Included in download

    • Convert expensive interactive clusters to Jobs Compute for 70%+ DBU savings.
    • Identify Python UDFs and code patterns that are blocking Photon acceleration.
    • terminal, network automation included
    • Instant install

    Sample input

    Audit my Databricks job configuration and the PySpark code in monthly_report.py for cost leaks. We are on the Premium tier in us-east-1.

    Sample output

    Top Finding: R001 (High)

    • Issue: Scheduled job using All-Purpose Compute.
    • Est. Saving: $432/mo
    • Arithmetic: (0.55 - 0.15 DBU) * 8 nodes * 6hrs/day * 30 days.
    • Fix: Change 'existing_cluster_id' to 'new_cluster' (Jobs Compute) in job JSON.

    Code Finding: R041

    • Issue: Python UDF in monthly_report.py.
    • Impact: Disables Photon optimization.
    • Fix: Replace lines 45-52 with native PySpark SQL functions.

    About This Skill

    What it does

    The Databricks Auditor for AWS identifies hidden waste in your Databricks environment by running a deterministic rule engine against your cluster configurations, job schedules, Delta table metadata, and PySpark code. It moves beyond generic advice by calculating dollar-denominated savings estimates based on 2026 DBU list pricing and AWS EC2 rates.

    Why use this skill

    Unlike general AI prompting, this skill uses a specialized rule engine and inlined pricing data to provide quantifiable ROI. It distinguishes between DBU costs and EC2 infrastructure costs—a critical nuance often missed. It identifies high-leverage arbitrages like converting All-Purpose clusters to Jobs Compute (73% savings) and right-sizing SQL Warehouses.

    Key Features

    • Compute Right-sizing: Detects misconfigured Spot/Fleet instances, Graviton opportunities, and Photon misuse.
    • Delta Lake Optimization: Identifies small file issues, over-partitioning, and missing Z-ORDER/Liquid Clustering.
    • Code Anti-pattern Detection: Scans PySpark for expensive collect() calls, Python UDFs that break Photon, and withColumn loops.
    • SQL Warehouse Audit: Analyzes auto-stop settings and serverless candidacy for BI workloads.

    The Output

    You receive a ranked list of findings starting with the biggest cost leaks. Each finding includes the current cost, estimated monthly savings (with the arithmetic shown), a technical "why," and a copy-paste fix snippet or Terraform configuration change.

    Use Cases

    • Convert expensive interactive clusters to Jobs Compute for 70%+ DBU savings.
    • Identify Python UDFs and code patterns that are blocking Photon acceleration.
    • Right-size Delta Lake tables by detecting small files and over-partitioning.
    • Calculate monthly AWS spend reduction by switching to Graviton and Spot worker nodes.

    Reviews

    No reviews yet - be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    Terminal / Shell
    Network Access

    Allowed Hosts

    www.databricks.com
    aws.amazon.com
    docs.databricks.com
    spark.apache.org

    File Scopes

    data/**
    references/**
    scripts/**
    examples/**
    evals/**

    Frequently Asked Questions