About This Skill
# AI Stack Spend Audit
Stop overpaying for AI. This skill audits your actual LLM usage across every provider, finds the waste, and projects your runway at current burn rate.
## What it does
- **Pulls usage data** from OpenAI, Anthropic, Google AI Studio, AWS Bedrock, Azure OpenAI, and self-hosted (Ollama, vLLM, TGI)
- **Computes true cost** per feature, per user, per request — not just per token
- **Detects waste**: oversized models for the task, idle always-on endpoints, redundant calls (same prompt twice), broken streaming
- **Projects runway** at current burn rate and trend
- **Generates a budget plan** with concrete reduction actions
- **Outputs a one-page exec report** plus a detailed CSV for finance
## When to use it
- Your OpenAI bill jumped 3x and you don't know why
- You have multiple LLM providers and no unified view
- Engineering says "we need to cut AI spend" and you need a real plan
- You're pitching to investors and need a credible burn-rate story
- You want to know if self-hosting (Llama 3.1 70B) would actually save money
- You're a fractional CTO auditing a portfolio company's stack
## Why it's better than ad-hoc prompting
Most "audit my LLM spend" prompts give you generic advice. This skill is different:
- **Reads your actual logs** — `usage_log.jsonl`, CloudWatch exports, provider dashboards
- **Statistical rigor** — uses Tukey fences for outlier detection, not vibes
- **Cross-provider normalization** — converts everything to $/Mtok with provider-specific pricing
- **Actionable** — every finding has a `save_per_month` estimate and a `how_to_fix` link
- **Forecasting** — projects 30/60/90 day spend with confidence bands
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ Agent (Claude/Cursor) │
│ - Asks for cost data sources │
│ - Calls analyzer script │
└───────────────┬─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ skills/ai-stack-spend-audit/ │
│ scripts/ │
│ ├── ingest.py # Multi-source data loader │
│ ├── analyze.py # Cost & waste computation │
│ ├── project.py # Runway + forecasting │
│ ├── report.py # Markdown + CSV output │
│ └── optimize.py # Concrete cut recommendations │
│ references/ │
│ ├── provider-pricing-2026.md # $/Mtok tables │
│ ├── self-host-tco-calculator.md │
│ └── finops-playbook.md │
└─────────────────────────────────────────────────────────┘
```
## Quick start
```bash
# 1. Install
pip install pandas matplotlib
# 2. Ingest from any provider
python scripts/ingest.py --source openai --api-key $OPENAI_ADMIN_KEY --out usage.csv
python scripts/ingest.py --source anthropic --api-key $ANTHROPIC_ADMIN_KEY --out usage.csv
python scripts/ingest.py --source aws-bedrock --s3-bucket my-llm-logs --out usage.csv
python scripts/ingest.py --source local --log-file ./vllm.log --out usage.csv
# 3. Analyze
python scripts/analyze.py usage.csv
# 4. Project runway
python scripts/project.py usage.csv --current-cash 500000 --out report.md
# 5. Get the cut plan
python scripts/optimize.py usage.csv --target-cut 30
```
## Sample output (excerpt)
```
## AI Stack Spend Audit — June 2026
Total spend MTD: $14,231.55
Projected month-end: $28,400
vs. last month: +47% ⚠️
Cost per active user: $0.47/day
Cost per 1k requests: $3.12
### Top 5 waste sources
1. ❌ Customer-support-bot using gpt-4o for FAQ: $4,200/mo
→ Switch to gpt-4o-mini, save $3,800/mo (90% reduction)
2. ❌ Embeddings regenerated on every request: $2,100/mo
→ Add Redis cache (1h TTL), save $1,900/mo
3. ❌ Context window always max (128k) for 200-token prompts: $1,800/mo
→ Truncate to 4k, save $1,500/mo
4. ⚠️ Anthropic Claude 3 Opus for "summarize this email": $1,200/mo
→ Switch to Claude 3.5 Haiku, save $1,100/mo
5. ⚠️ Self-hosted Llama 3.1 70B on A100x4 running 3% utilized: $3,400/mo
→ Scale to 1 GPU or shut down, save $2,500/mo
Runway at $500k cash, current burn: 17.6 months
After recommended cuts: 31.4 months (+14 months)
```
## Supported data sources
| Source | Method | Auth |
|--------|--------|------|
| OpenAI | Admin API `/v1/usage` | Admin key |
| Anthropic | Admin API `/v1/organizations/usage` | Admin key |
| Google AI Studio | Cloud Logging | Service account |
| AWS Bedrock | CloudWatch Logs | IAM |
| Azure OpenAI | Cost Management API | AAD |
| Ollama | Log file parsing | None |
| vLLM | Log file parsing | None |
| Custom (JSONL) | Direct file | None |
## The waste patterns it catches
1. **Oversized models** — using Opus/GPT-4 for trivial tasks
2. **Uncached embeddings** — same text re-embedded thousands of times
3. **Streaming abandonment** — clients disconnect mid-stream, you still pay for generated tokens
4. **Context stuffing** — sending 100k tokens for a 200-token answer
5. **Idle endpoints** — always-on GPU instances at <10% utilization
6. **Redundant calls** — same prompt sent multiple times (no idempotency)
7. **Function-call loops** — agents calling tools recursively without exit conditions
8. **Test/prod mixing** — dev traffic on production keys
9. **Expensive fallbacks** — retrying on an expensive model when a cheap one would do
10. **Time waste** — long-running requests for cheap work (over-provisioned)
## Pricing
Single-purchase, lifetime access. $12.00.
Includes:
- 5 Python scripts (ingest, analyze, project, report, optimize)
- Provider pricing reference (updated quarterly)
- Self-host TCO calculator
- FinOps playbook (12 optimization patterns)
- Sample data for testing
- Future updates for the same major version
## Example usage
> "Here's our OpenAI Admin key. We have $50k in the bank. What are we wasting and when do we run out?"
The skill will:
1. Pull last 90 days of usage from OpenAI
2. Categorize by feature (requires tagging, falls back to model)
3. Compute per-feature cost
4. Identify top 5 waste sources
5. Project runway at current burn
6. Output `report.md` + `spend-detail.csv`
## Compliance
Generates evidence suitable for:
- SOC2 cost-monitoring controls
- FinOps Foundation certification
- Board-level budget reviews
- Customer audits (pass-through billing)
## Compatibility
Works with any agent that supports the SKILL.md standard and can execute Python: Claude Code, OpenClaw, Codex CLI, Cursor, Gemini CLI, Cline, Windsurf, Aider. Tested on Linux, macOS, Windows.
## Tags
finops, cost-optimization, llm, openai, anthropic, aws, budget, agent-ops, observability, runway