pyspark-cost-linter

by Ikerg

Detect and fix the 9 most expensive PySpark anti-patterns to slash your Spark compute costs.

Updated Jun 2026

Identify Python UDFs that are disabling Photon acceleration and increasing DBUs.
Find collect() or toPandas() calls that risk driver OOMs and idle clusters.
Detect withColumn loops that cause exponential Spark optimizer slowdowns.

Security scannedInstant install

Free

Included in download

Downloadable skill package
1 permission declared

Ikerg

Sample input

Review my Spark job in src/silver_layer.py for any performance issues or code that might be making our Databricks bill too high.

Sample output

Found 2 high-severity issues in silver_layer.py:

[R041] Line 45: Python UDF detected. This disables Photon acceleration. Refactor using pyspark.sql.functions.
[R045] Line 82: withColumn inside a loop. This causes quadratic plan growth. Use select() or withColumns() instead.

pyspark-cost-linter

by Ikerg

Detect and fix the 9 most expensive PySpark anti-patterns to slash your Spark compute costs.

Updated Jun 2026

0 installs

Free

⚡ Also available via Agensi MCP - your AI agent can load this skill on demand via MCP. Learn more →

Included in download

Downloadable skill package
1 permission declared
Instant install

Sample input

Review my Spark job in src/silver_layer.py for any performance issues or code that might be making our Databricks bill too high.

Sample output

Found 2 high-severity issues in silver_layer.py:

[R041] Line 45: Python UDF detected. This disables Photon acceleration. Refactor using pyspark.sql.functions.
[R045] Line 82: withColumn inside a loop. This causes quadratic plan growth. Use select() or withColumns() instead.

0 installs

About This Skill

What it does

The PySpark Cost Linter is a specialized diagnostic tool designed to scan PySpark scripts for expensive anti-patterns that inflate cloud compute bills. It identifies specific code-level inefficiencies—such as driver-side bottlenecks, unoptimized UDFs, and plan-bloating loops—providing severity ratings and idiomatic refactoring advice for each finding.

Why use this skill

Standard linters catch syntax errors, but they don't catch the "silent killers" of Spark performance. This skill is built for data engineers who need to optimize Databricks or EMR jobs without manually auditing thousands of lines of code. It uses deterministic rules (R040–R050) to find patterns that disable Photon acceleration or cause quadratic plan analysis times, saving you from expensive trial-and-error debugging.

Supported tools

PySpark (Core & SQL)
Databricks (Photon & AQE optimization checks)
CI/CD Integration (via JSON output)
Common data formats (Delta, Parquet, CSV, JSON)

Output

The skill produces a structured report mapping rule IDs to specific line numbers. Each finding includes a description of the cost impact (e.g., "OOM Risk" or "Photon Disabled") and a code-level recommendation to fix the leak.

Use Cases

Identify Python UDFs that are disabling Photon acceleration and increasing DBUs.
Find collect() or toPandas() calls that risk driver OOMs and idle clusters.
Detect withColumn loops that cause exponential Spark optimizer slowdowns.
Audit Spark configurations like AQE and shuffle partitions for cost efficiency.
Analyze schema inference logic to prevent redundant data passes.

How to Install

mkdir -p ~/.claude/skills && curl -sL https://www.agensi.io/api/install/pyspark-cost-linter -o /tmp/pyspark-cost-linter.zip && unzip -o /tmp/pyspark-cost-linter.zip -d ~/.claude/skills && rm /tmp/pyspark-cost-linter.zip

Free skills install directly. Paid skills require purchase - use the download button above after buying.

Reviews

No reviews yet - be the first to share your experience.

Only users who have downloaded or purchased this skill can leave a review.

No reviews yet - be the first to share your experience.

Only users who have downloaded or purchased this skill can leave a review.

Security Scanned

Passed automated security review

Permissions

Terminal / Shell

Allowed Hosts

agensi.io

File Scopes

scripts/**

examples/**

Creator

Ikerg

Lead Data Engineer with 11 years of experience designing and delivering scalable data platforms across Databricks, AWS, and Azure ecosystems. Proven track record of building high-performance data solutions for large-scale, data-intensive organizations in industries including healthcare and robotics. Extensive experience working in highly regulated environments, managing complex data pipelines and large volumes of structured and unstructured data.

Frequently Asked Questions

Learn More About AI Agent Skills

More Premium Skills

inline-comment

Best way to steer your agents, effortlessly.

$9.994 installs

designing-hybrid-context-layers

Architects the right retrieval strategy for every query — teaching your agent when to use RAG, a knowledge graph, or a temporal index instead of defaulting to vector search for everything.

$1015 installs

consumer-motivation-analyzer

Go beyond surface-level feedback to uncover the psychological drivers and hidden motivations behind buyer behavior.

$199 installs

Bounty Security Pattern Master Library — 399 Vulnerability Patterns

A premium library of 399 vulnerability patterns and DeFi attack vectors for AI-driven bug hunting and security audits.

$758 installs

pyspark-cost-linter

Included in download

pyspark-cost-linter

Included in download

About This Skill

What it does

Why use this skill

Supported tools

Output

Use Cases

How to Install

How to Install

Reviews

Permissions

Tags

Creator

Frequently Asked Questions

How does this differ from a standard Python linter like Flake8 or Pylint?

Which AI agents or platforms can I run this skill on?

Does the skill prioritize which code fixes will save the most money?

What is required to start scanning my Spark scripts?

What exactly is included in the purchase of this skill?

Learn More About AI Agent Skills

More Premium Skills

inline-comment

designing-hybrid-context-layers

consumer-motivation-analyzer

Bounty Security Pattern Master Library — 399 Vulnerability Patterns