Incident Postmortem Generator
by Kaymue
Turn a 3am outage into a postmortem in 10 minutes. Slack/PagerDuty ingest, 5-Whys, blameless framing, action items. SEV1/2/3.
Free
Incident Postmortem Generator
by Kaymue
Turn a 3am outage into a postmortem in 10 minutes. Slack/PagerDuty ingest, 5-Whys, blameless framing, action items. SEV1/2/3.
Free
Included in download
- Downloadable skill package
- 1 permission declared
- Instant install
About This Skill
# Incident Postmortem Generator You just survived a 4-hour outage. Everyone's exhausted. The PM wants a writeup by EOD. The CEO wants root cause. Legal wants a timeline. This skill takes your chaos and produces a blameless postmortem in 10 minutes. ## What it does Ingests raw incident data and produces a complete postmortem: - **Timeline reconstruction** — merges PagerDuty alerts, Slack messages, deploy events, status updates - **Root cause analysis** — 5-Whys + fault tree + contributing factors - **Impact quantification** — users affected, $ lost, SLA breach - **Action items** — owner-assigned, prioritized, with due dates - **Blameless framing** — flags blame-y language and rewrites it - **SEV1/2/3 templates** — different depth for different severity - **Blameless review** — focus on systems, not individuals ## When to use it - A SEV1 just happened and you need a postmortem by EOD - Your on-call team is burned out and skipping writeups - Incident reviews drag on for 2 hours because the timeline is wrong - Action items from past postmortems never get done (no owner) - You're scaling your incident response process - An auditor is asking "show me your postmortem process" ## Why it's better than ad-hoc prompting Most "write a postmortem" prompts give generic templates. This skill is different: - **Ingests your data** — Slack export, PagerDuty timeline, deploy log - **Auto-reconstructs** — doesn't make you re-type the timeline - **Blameless-aware** — actively flags and rewrites blame language - **Action items with owners** — not "we should..." but "@alice owns X by date Y" - **Severity-aware** — SEV1 has 12 sections, SEV3 has 4 ## Architecture ``` ┌─────────────────────────────────────────────────────────┐ │ Agent (Claude/Cursor) │ │ - Points at Slack export / PagerDuty log / timeline │ │ - Calls generator │ │ - Reviews for blame, action items │ └───────────────┬─────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ skills/incident-postmortem-generator/ │ │ scripts/ │ │ ├── ingest_slack.py # Parse Slack export │ │ ├── ingest_pagerduty.py # Parse PD timeline │ │ ├── ingest_deploys.py # Parse deploy log │ │ ├── build_timeline.py # Merge all sources │ │ ├── generate.py # Render postmortem MD │ │ ├── check_blameless.py # Flag blame language │ │ └── action_tracker.py # Export to Linear/Jira │ │ references/ │ │ ├── blameless-guide.md # How to write blameless │ │ ├── 5-whys.md # RCA technique │ │ ├── fault-tree.md # RCA technique │ │ ├── sev-templates.md # SEV1/2/3 templates │ │ └── action-items-guide.md │ │ templates/ │ │ ├── sev1.md.tmpl │ │ ├── sev2.md.tmpl │ │ └── sev3.md.tmpl │ └─────────────────────────────────────────────────────────┘ ``` ## Quick start ```bash # 1. Install pip install python-dateutil # 2. Ingest from various sources python scripts/ingest_slack.py --export slack-export-2026-06-20.json python scripts/ingest_pagerduty.py --incident PD-12345 python scripts/ingest_deploys.py --log deploys.log # 3. Merge into one timeline python scripts/build_timeline.py --out timeline.json # 4. Generate postmortem python scripts/generate.py --timeline timeline.json --sev 1 --out postmortem-2026-06-20.md # 5. Check for blame language python scripts/check_blameless.py postmortem-2026-06-20.md # 6. Export action items to Jira python scripts/action_tracker.py postmortem-2026-06-20.md --tracker jira ``` ## Sample output (excerpt) ```markdown # Postmortem: API Gateway Outage — 2026-06-20 **Severity**: SEV1 **Duration**: 4h 12m (14:32 UTC → 18:44 UTC) **Status**: Resolved **Customer impact**: ~23,000 users, 1.2M failed requests, est. $14k revenue loss **SLA**: 99.9% breached by 0.04% for the month ## Timeline (UTC) | Time | Event | Source | |------|-------|--------| | 14:32 | @alice on-call paged: "5xx rate 8%" | PagerDuty | | 14:35 | @alice acks, opens #inc-2026-06-20 | Slack | | 14:41 | Identified: API gateway OOMKilled | Slack | | 14:48 | @bob: "Scaled gateway from 4 to 8 pods" | Slack | | 15:10 | 5xx back to 0.3%, but elevated | Slack | | 15:30 | Root cause: memory leak in new auth middleware (deployed 14:28) | Slack | | 15:45 | @bob: "Rolled back deploy auth-middleware v2.3.1 → v2.3.0" | Deploy log | | 16:00 | 5xx back to 0.05% (normal) | Slack | | 16:30 | @alice: "Monitoring for 2h before closing" | Slack | | 18:44 | Incident closed | PagerDuty | ## Root cause A memory leak was introduced in `auth-middleware` v2.3.1 (deployed 14:28 UTC). The leak caused the API gateway pods to OOMKilled within 4 minutes of receiving traffic, cycling faster than the load balancer could route around. ### 5 Whys 1. **Why did the API fail?** Pods were OOMKilled 2. **Why were pods OOMKilled?** Memory leak in auth-middleware 3. **Why was there a memory leak?** New code didn't release session cache 4. **Why didn't we catch it?** Load test didn't include 24h soak test 5. **Why didn't we have a 24h soak test?** Our CI pipeline only does 5min load tests ### Contributing factors - No memory limits set on the new auth-middleware container - Canary deploy was 5% — too small to detect a slow leak - No automated rollback on memory growth rate - The deployment was 4 minutes before the incident — no time to detect pre-pager ## Impact - **Users affected**: ~23,000 (12% of daily active) - **Failed requests**: 1.2M - **Revenue lost**: ~$14,000 (subscription churn + transaction failures) - **SLA breach**: 99.9% monthly → 99.86% (over 24h, partial month) ## Action items | # | Action | Owner | Priority | Due | Status | |---|--------|-------|----------|-----|--------| | 1 | Add memory limit to all middleware containers | @bob | P0 | 2026-06-25 | Open | | 2 | Add 24h soak test to CI pipeline | @carol | P0 | 2026-07-05 | Open | | 3 | Increase canary deploy to 20% | @bob | P1 | 2026-07-10 | Open | | 4 | Auto-rollback on memory growth > 20%/h | @dave | P1 | 2026-07-15 | Open | | 5 | Add memory-leak detector to pre-commit hooks | @alice | P2 | 2026-07-30 | Open | | 6 | Document postmortem process in runbook | @alice | P2 | 2026-08-01 | Open | ## What went well - Fast ack time (3 min from page to incident channel) - Quick identification of root cause - Clean rollback, no data loss ## What went poorly - Memory limit missing on new container - Canary too small to detect slow leak - No automated rollback on memory anomalies ``` ## Pricing Single-purchase, lifetime access. $8.50. Includes: - 7 Python scripts (4 ingest + 1 generate + 1 check + 1 tracker) - 5 reference docs (blameless, 5-whys, fault tree, SEV templates, actions) - 3 SEV templates (SEV1/2/3) - Slack/PagerDuty/Jira integration code - Future updates for the same major version ## Example usage > "We had a 4-hour SEV1 yesterday. Here's the Slack export and PagerDuty incident ID. Generate the postmortem." The skill will: 1. Parse Slack and PagerDuty data 2. Reconstruct timeline 3. Run 5-Whys + fault tree 4. Generate postmortem with action items 5. Flag any blame language 6. Export action items to your tracker ## Compatibility Works with any agent that supports the SKILL.md standard and can execute Python: Claude Code, OpenClaw, Codex CLI, Cursor, Gemini CLI, Cline, Windsurf, Aider. Slack export: free. PagerDuty: API token. Jira/Linear: API token. Tested on Linux, macOS, Windows. ## Tags sre, incident-response, postmortem, devops, on-call, monitoring, reliability
Use Cases
- Turn a chaotic 3am outage into a structured postmortem in 10 minutes. Ingests timeline + logs + chat, produces a blameless postmortem with root cause analysis, contributing factors, and tracked action items. Templates for SEV1/2/3.
How to Install
mkdir -p ~/.claude/skills && curl -sL https://www.agensi.io/api/install/incident-postmortem-generator -o /tmp/incident-postmortem-generator.zip && unzip -o /tmp/incident-postmortem-generator.zip -d ~/.claude/skills && rm /tmp/incident-postmortem-generator.zipFree skills install directly. Paid skills require purchase - use the download button above after buying.
Reviews
No reviews yet - be the first to share your experience.
Only users who have downloaded or purchased this skill can leave a review.
No reviews yet - be the first to share your experience.
Only users who have downloaded or purchased this skill can leave a review.
Security Scanned
Passed automated security review
Permissions
Works with any agent that supports the universal SKILL.md standard
Creator
Frequently Asked Questions
Learn More About AI Agent Skills
More Premium Skills
Multi-Agent Orchestration Master Library
Transform Claude Code into a coordinated multi-agent system. Battle-tested tmux orchestration patterns, YAML task queues, event-driven communication, and parallel worker management for 8+ agents.
designing-hybrid-context-layers
Architects the right retrieval strategy for every query — teaching your agent when to use RAG, a knowledge graph, or a temporal index instead of defaulting to vector search for everything.
ai-automation-qa-pack
Professional QA & UAT documentation generator for AI automation agencies and complex agent deployments.
Bounty Security Pattern Master Library — 399 Vulnerability Patterns
A premium library of 399 vulnerability patterns and DeFi attack vectors for AI-driven bug hunting and security audits.