Skip to main content

Incident Notice Template

Use When: Pipeline failure, data quality incident, or significant delay Audience: All stakeholders (business, operations, finance) Frequency: As needed (when incident occurs)


Subject

[URGENT] Data Processing Incident - [Brief Description]

To

Business Stakeholders, Finance, Operations

From

Data Platform Team

Date

[Date/Time]


Hi Team,

What Happened

[Brief summary of the incident - 2-3 sentences]

Example: "The January 31 processing run failed at 18:30 CET due to [root cause - e.g., source data format change, network timeout, validation error]. This affected [X] transactions scheduled for processing."

Technical Details:

  • Incident Time: [timestamp]
  • Run ID: [run_id]
  • Root Cause: [brief technical explanation]
  • Affected Data: [date range, transaction count, etc.]

Impact (Business)

What this means for stakeholders:

  • [X] dashboards/reports delayed by [Y] hours
  • Month-end reporting: [on time / delayed by X hours]
  • Data freshness: Currently [X] hours behind schedule
  • Business decisions: [impact on decision-making if applicable]
  • [Any other business impact]

Affected Reports/Dashboards:

  • [Report 1]: [status - delayed / available / unavailable]

What We're Doing

Immediate Actions:

  1. [Action 1 - e.g., "Investigating root cause"]
  2. [Action 2 - e.g., "Rerunning failed processing job"]
  3. [Action 3 - e.g., "Validating data quality after reprocessing"]

Prevention:

  • [Action to prevent recurrence - e.g., "Adding validation for source data format changes"]

Status Updates:

  • Next update: [time/date]
  • Update frequency: [every X hours / when resolved]

When It's Resolved / Next Update

Expected Resolution:

  • Target: [time/date]
  • Confidence: [High / Medium / Low]

If Not Resolved by [time]:

  • Escalation Plan: [plan - e.g., "Escalate to CTO", "Manual processing fallback"]
  • Business Continuity: [what stakeholders can do in the meantime]

Next Status Update:

  • When: [time/date]
  • How: [email / Slack / dashboard]

Current Status

RAG Status: [Green / Amber / Red]

MetricCurrent ValueTargetStatus
Data Freshness[X] hours behind< 1 hour[✅/⚠️/🔴]
Processing Status[Failed / In Progress / Resolved]Operational[✅/⚠️/🔴]
Exception Rate[X]%< 0.5%[✅/⚠️/🔴]
Incident Impact[X] dashboards delayed0[✅/⚠️/🔴]

Overall: [Brief status - e.g., "Investigating - expect resolution within 2 hours"]

Questions

For Immediate Issues:

  • Contact: [Name/Team]
  • Email: [email]
  • Slack: [#data-platform] (urgent channel)

For Business Impact Questions:

  • Finance: [Contact]
  • Operations: [Contact]

Best regards, [Name] Data Platform Team


Template Notes

Customize:

  • Replace all bracketed placeholders with actual values
  • Adjust impact description based on severity
  • Update status table metrics based on what stakeholders need
  • Add/remove sections based on incident type

When to Use:

  • Pipeline failure (processing job fails)
  • Data quality incident (high exception rate, reconciliation mismatch)
  • Significant delay (processing time exceeds SLA significantly)
  • Any incident that affects business reporting or decision-making

Severity Levels:

  • Red (Critical): Multiple dashboards delayed > 4 hours, month-end reporting at risk
  • Amber (Warning): Single dashboard delayed, processing slow but within tolerance
  • Green (Info): Minor issue, resolved quickly, no business impact

Related Templates:

Follow-Up:

  • Send resolution notice when incident is resolved
  • Include root cause analysis in weekly summary if significant

Task 5 Documentation

© 2026 Stephen AdeiCC BY 4.0