About This Skill
# Prometheus Migration Tool
Datadog just raised prices again. Your observability bill is $40k/month. This skill migrates you to Prometheus + Grafana + Loki for 60-80% less, with the same UX.
## What it does
End-to-end migration from any monitoring system to Prometheus + Grafana + Loki:
- **Cost calculator** — compare current spend vs self-hosted, with real pricing
- **Source discovery** — auto-detect services, metric types, tags
- **Metric mapping** — Datadog metric → Prometheus metric
- **Scrape config generator** — ready-to-deploy prometheus.yml
- **Dashboard converter** — Datadog JSON → Grafana JSON
- **Alert translator** — Datadog monitor → Prometheus alert rule
- **Log pipeline** — log forwarding (Fluent Bit / Vector) + Loki
- **Migration runbook** — phased rollout, rollback plan
## When to use it
- Datadog / New Relic raised prices and you want out
- Your observability bill is > 30% of cloud bill
- You want vendor independence
- You need on-prem observability (air-gapped, regulated)
- You're scaling to 10k+ services and per-host pricing kills you
## Why it's better than ad-hoc prompting
Most "migrate to Prometheus" prompts give generic advice. This skill is different:
- **Real pricing** — uses actual 2026 rates from Datadog, New Relic, Grafana
- **Concrete configs** — ready-to-deploy prometheus.yml, not pseudocode
- **Dashboard converter** — actually translates Datadog JSON to Grafana JSON
- **Phased plan** — week-by-week rollout, not "just do it"
- **Rollback strategy** — because migrations fail
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ Agent (Claude/Cursor) │
│ - Asks about current stack, scale, budget │
│ - Reads Datadog JSON / config files │
│ - Generates complete migration plan │
└───────────────┬─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ skills/prometheus-migration-tool/ │
│ scripts/ │
│ ├── cost_calc.py # Current vs target cost │
│ ├── discover.py # Find services + metrics │
│ ├── gen_prometheus.py # prometheus.yml │
│ ├── convert_dashboards.py # Datadog → Grafana │
│ ├── translate_alerts.py # Datadog monitor → Prom │
│ ├── gen_log_pipeline.py # Fluent Bit / Vector config │
│ ├── gen_runbook.py # Phased migration plan │
│ └── validate.py # Test scrape configs │
│ references/ │
│ ├── pricing-2026.md # Real prices for comparison│
│ ├── metric-mapping.md # Datadog → Prometheus │
│ ├── grafana-vs-datadog.md │
│ ├── self-hosting-options.md │
│ └── rollout-strategy.md │
│ templates/ │
│ ├── prometheus.yml │
│ ├── alertmanager.yml │
│ ├── grafana-dashboard.json │
│ ├── promtail-config.yaml │
│ └── docker-compose.yml │
└─────────────────────────────────────────────────────────┘
```
## Quick start
```bash
# 1. Install
pip install pyyaml requests datadog-api-client
# 2. Cost comparison
python scripts/cost_calc.py --current datadog --hosts 200 --metrics 50M --logs 100GB --out cost-report.md
# 3. Discover current services
python scripts/discover.py --source datadog --api-key $DD_KEY --app-key $DD_APP --out services.json
# 4. Generate Prometheus config
python scripts/gen_prometheus.py --services services.json --out prometheus.yml
# 5. Convert dashboards
python scripts/convert_dashboards.py --input dashboards/datadog/ --output grafana/
# 6. Translate alerts
python scripts/translate_alerts.py --input monitors.json --out alertmanager.yml
# 7. Generate log pipeline
python scripts/gen_log_pipeline.py --source datadog-logs --target loki --out fluent-bit.conf
# 8. Generate migration runbook
python scripts/gen_runbook.py --current datadog --hosts 200 --timeline 8w --out runbook.md
# 9. Validate scrape configs
python scripts/validate.py prometheus.yml
```
## Cost comparison (real 2026 pricing)
### Scenario: 200 hosts, 50M metrics/month, 100GB logs/month
| Component | Datadog | Self-hosted (Prometheus + Loki) | Managed (Grafana Cloud) |
|-----------|--------:|------------------------------:|------------------------:|
| Metrics | $7,500 | $1,200 (infra) | $4,500 |
| Logs | $9,200 | $800 (S3 + queries) | $3,800 |
| Traces (APM) | $5,400 | $1,500 (Tempo) | $4,200 |
| Dashboards | included | $0 (Grafana OSS) | included |
| **Total** | **$22,100** | **$3,500** | **$12,500** |
| **Savings** | — | **84% ($18,600/mo)** | 43% ($9,600/mo) |
Plus: vendor lock-in, full data control, customization freedom.
## The metric mapping (Datadog → Prometheus)
| Datadog | Prometheus | Notes |
|---------|------------|-------|
| `system.cpu.user` | `node_cpu_seconds_total{mode="user"}` | Different unit |
| `aws.ec2.cpuutilization` | `node_cpu_seconds_total` (via node_exporter) | Requires node_exporter |
| `http_requests{service:x}` | `http_requests_total{service="x"}` | Naming convention |
| `datadog.estimated_usage.metrics` | `up` + custom | Harder to migrate |
| `trace..duration` | `traces_spanmetrics_latency_*` (Tempo) | Via OTel |
## The 4 phases of migration
### Phase 1: Discover + Plan (1 week)
- Inventory current usage (cost calculator)
- Identify critical dashboards / alerts
- Define "must have" vs "nice to have"
- Select target (self-hosted vs managed)
### Phase 2: Stand up target (1-2 weeks)
- Deploy Prometheus + Grafana + Loki
- Configure scrape for 1-2 non-critical services
- Validate metric collection
- Build 1-2 example dashboards
### Phase 3: Shadow run (2-3 weeks)
- Run both old and new in parallel
- Compare metric values (Datadog vs Prometheus)
- Convert all dashboards
- Translate all alerts
- Train team
### Phase 4: Cutover + cleanup (2 weeks)
- Switch default to new system
- Monitor for gaps
- Decommission old (after 30-day shadow)
- Document lessons learned
## Sample output: Cost report
```markdown
## Observability Cost Analysis
### Current (Datadog)
- 200 hosts × $15/host = $3,000
- 50M custom metrics × $0.05/1000 = $2,500
- 100GB logs × $0.10/GB = $10,000
- APM 200 hosts × $31 = $6,200
- Synthetics: $500
- **Total: $22,200/month**
### Target: Self-hosted (Prometheus + Loki)
- 3× m5.4xlarge ($1,200/mo) — Prometheus + Grafana
- 5× r5.2xlarge for Loki ingest ($2,000/mo)
- S3 storage 10TB ($230/mo)
- EBS + transfer: $200/mo
- **Total: $3,630/month**
### Savings
- **Monthly: $18,570 (84%)**
- **Annual: $222,840**
### Migration cost (one-time)
- Engineering time: 4-6 weeks × 2 engineers
- Tools: $0 (all OSS)
- Consultant (optional): $20-50k
- **Break-even: 1-3 months**
```
## Pricing
Single-purchase, lifetime access. $12.00.
Includes:
- 8 Python migration scripts
- 5 reference docs (pricing, mapping, comparison, self-hosting, rollout)
- 5 ready-to-deploy templates (Prometheus, alertmanager, dashboard, log, compose)
- Real 2026 pricing for Datadog / New Relic / Grafana / Grafana Cloud
- Future updates for the same major version
## Example usage
> "We're paying Datadog $22k/month. Need to cut 60%. Plan a migration to self-hosted Prometheus. We have 200 hosts, 50M metrics, 100GB logs."
The skill will:
1. Calculate current vs target cost ($22k → $3.5k, savings $18.5k/mo)
2. Generate prometheus.yml for 200 hosts
3. Convert your top 10 Datadog dashboards
4. Translate your critical alerts
5. Generate Fluent Bit config for log forwarding
6. Output 8-week migration runbook with rollback
## Compatibility
Works with any agent that supports the SKILL.md standard and can execute Python: Claude Code, OpenClaw, Codex CLI, Cursor, Gemini CLI, Cline, Windsurf, Aider. Outputs: Prometheus 2.x, Grafana 10.x, Loki 2.x, AlertManager 0.27+. Tested on Linux, macOS, Windows.
## Tags
prometheus, grafana, observability, datadog, newrelic, cloudwatch, monitoring, sre, migration