1

    ai Stack Spend Audit 1

    by Kaymue

    Audit AI/LLM spend across OpenAI, Anthropic, AWS Bedrock, Azure. Find waste, project runway, FinOps report. Free scripts.

    Updated Jun 2026
    0 installs

    Free

    Included in download

    • Downloadable skill package
    • 4 permissions declared
    • Instant install

    About This Skill

    # AI Stack Spend Audit Stop overpaying for AI. This skill audits your actual LLM usage across every provider, finds the waste, and projects your runway at current burn rate. ## What it does - **Pulls usage data** from OpenAI, Anthropic, Google AI Studio, AWS Bedrock, Azure OpenAI, and self-hosted (Ollama, vLLM, TGI) - **Computes true cost** per feature, per user, per request — not just per token - **Detects waste**: oversized models for the task, idle always-on endpoints, redundant calls (same prompt twice), broken streaming - **Projects runway** at current burn rate and trend - **Generates a budget plan** with concrete reduction actions - **Outputs a one-page exec report** plus a detailed CSV for finance ## When to use it - Your OpenAI bill jumped 3x and you don't know why - You have multiple LLM providers and no unified view - Engineering says "we need to cut AI spend" and you need a real plan - You're pitching to investors and need a credible burn-rate story - You want to know if self-hosting (Llama 3.1 70B) would actually save money - You're a fractional CTO auditing a portfolio company's stack ## Why it's better than ad-hoc prompting Most "audit my LLM spend" prompts give you generic advice. This skill is different: - **Reads your actual logs** — `usage_log.jsonl`, CloudWatch exports, provider dashboards - **Statistical rigor** — uses Tukey fences for outlier detection, not vibes - **Cross-provider normalization** — converts everything to $/Mtok with provider-specific pricing - **Actionable** — every finding has a `save_per_month` estimate and a `how_to_fix` link - **Forecasting** — projects 30/60/90 day spend with confidence bands ## Architecture ``` ┌─────────────────────────────────────────────────────────┐ │ Agent (Claude/Cursor) │ │ - Asks for cost data sources │ │ - Calls analyzer script │ └───────────────┬─────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ skills/ai-stack-spend-audit/ │ │ scripts/ │ │ ├── ingest.py # Multi-source data loader │ │ ├── analyze.py # Cost & waste computation │ │ ├── project.py # Runway + forecasting │ │ ├── report.py # Markdown + CSV output │ │ └── optimize.py # Concrete cut recommendations │ │ references/ │ │ ├── provider-pricing-2026.md # $/Mtok tables │ │ ├── self-host-tco-calculator.md │ │ └── finops-playbook.md │ └─────────────────────────────────────────────────────────┘ ``` ## Quick start ```bash # 1. Install pip install pandas matplotlib # 2. Ingest from any provider python scripts/ingest.py --source openai --api-key $OPENAI_ADMIN_KEY --out usage.csv python scripts/ingest.py --source anthropic --api-key $ANTHROPIC_ADMIN_KEY --out usage.csv python scripts/ingest.py --source aws-bedrock --s3-bucket my-llm-logs --out usage.csv python scripts/ingest.py --source local --log-file ./vllm.log --out usage.csv # 3. Analyze python scripts/analyze.py usage.csv # 4. Project runway python scripts/project.py usage.csv --current-cash 500000 --out report.md # 5. Get the cut plan python scripts/optimize.py usage.csv --target-cut 30 ``` ## Sample output (excerpt) ``` ## AI Stack Spend Audit — June 2026 Total spend MTD: $14,231.55 Projected month-end: $28,400 vs. last month: +47% ⚠️ Cost per active user: $0.47/day Cost per 1k requests: $3.12 ### Top 5 waste sources 1. ❌ Customer-support-bot using gpt-4o for FAQ: $4,200/mo → Switch to gpt-4o-mini, save $3,800/mo (90% reduction) 2. ❌ Embeddings regenerated on every request: $2,100/mo → Add Redis cache (1h TTL), save $1,900/mo 3. ❌ Context window always max (128k) for 200-token prompts: $1,800/mo → Truncate to 4k, save $1,500/mo 4. ⚠️ Anthropic Claude 3 Opus for "summarize this email": $1,200/mo → Switch to Claude 3.5 Haiku, save $1,100/mo 5. ⚠️ Self-hosted Llama 3.1 70B on A100x4 running 3% utilized: $3,400/mo → Scale to 1 GPU or shut down, save $2,500/mo Runway at $500k cash, current burn: 17.6 months After recommended cuts: 31.4 months (+14 months) ``` ## Supported data sources | Source | Method | Auth | |--------|--------|------| | OpenAI | Admin API `/v1/usage` | Admin key | | Anthropic | Admin API `/v1/organizations/usage` | Admin key | | Google AI Studio | Cloud Logging | Service account | | AWS Bedrock | CloudWatch Logs | IAM | | Azure OpenAI | Cost Management API | AAD | | Ollama | Log file parsing | None | | vLLM | Log file parsing | None | | Custom (JSONL) | Direct file | None | ## The waste patterns it catches 1. **Oversized models** — using Opus/GPT-4 for trivial tasks 2. **Uncached embeddings** — same text re-embedded thousands of times 3. **Streaming abandonment** — clients disconnect mid-stream, you still pay for generated tokens 4. **Context stuffing** — sending 100k tokens for a 200-token answer 5. **Idle endpoints** — always-on GPU instances at <10% utilization 6. **Redundant calls** — same prompt sent multiple times (no idempotency) 7. **Function-call loops** — agents calling tools recursively without exit conditions 8. **Test/prod mixing** — dev traffic on production keys 9. **Expensive fallbacks** — retrying on an expensive model when a cheap one would do 10. **Time waste** — long-running requests for cheap work (over-provisioned) ## Pricing Single-purchase, lifetime access. $12.00. Includes: - 5 Python scripts (ingest, analyze, project, report, optimize) - Provider pricing reference (updated quarterly) - Self-host TCO calculator - FinOps playbook (12 optimization patterns) - Sample data for testing - Future updates for the same major version ## Example usage > "Here's our OpenAI Admin key. We have $50k in the bank. What are we wasting and when do we run out?" The skill will: 1. Pull last 90 days of usage from OpenAI 2. Categorize by feature (requires tagging, falls back to model) 3. Compute per-feature cost 4. Identify top 5 waste sources 5. Project runway at current burn 6. Output `report.md` + `spend-detail.csv` ## Compliance Generates evidence suitable for: - SOC2 cost-monitoring controls - FinOps Foundation certification - Board-level budget reviews - Customer audits (pass-through billing) ## Compatibility Works with any agent that supports the SKILL.md standard and can execute Python: Claude Code, OpenClaw, Codex CLI, Cursor, Gemini CLI, Cline, Windsurf, Aider. Tested on Linux, macOS, Windows. ## Tags finops, cost-optimization, llm, openai, anthropic, aws, budget, agent-ops, observability, runway

    Use Cases

    • Full-stack AI cost audit. Reads your actual usage logs, calculates true cost-per-feature and cost-per-user, surfaces waste (oversized models, idle endpoints, redundant calls), and projects runway at current burn rate. Works with any agent.

    Reviews

    No reviews yet - be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    Terminal / Shell
    Read Files
    Write Files
    Network Access

    Allowed Hosts

    api.openai.com

    File Scopes

    scripts/**

    Works with any agent that supports the universal SKILL.md standard

    Creator

    Frequently Asked Questions

    More Premium Skills

    Free