Skip to main content

© 2026 Stephen Adei. All rights reserved. All content on this site is the intellectual property of Stephen Adei. See License for terms of use and attribution.

Runtime Scenarios & Operational Flows

Overview

This document satisfies the arc42 Runtime View (Chapter 6) by synthesizing distributed scenario documentation into one hub page. It shows how the OLAP analytics pipeline behaves under various operational conditions: normal daily ingestion, backfills with late-arriving data, schema evolution deployments, quarantine retry workflows, and failure recovery scenarios. Source systems are upstream (Scope & Assumptions).

Each scenario includes a sequence diagram showing the interaction between actors, systems, and services, along with links to detailed documentation for implementation specifics.

Extended context: ETL Flow, Data Lake Architecture.


Scenario 1: Normal Daily Ingestion

Description: Standard daily ETL run triggered by scheduled EventBridge rule or S3 object creation event. Data flows from Bronze → Silver → Gold with promotion gate validation.

Trigger: EventBridge scheduled rule (daily 2 AM UTC) or S3 ObjectCreated event

Sequence Flow:

Key Points:

  • Run Identity: run_id derived from Step Functions execution name, propagated to all services
  • Promotion Gate: Lambda read_run_summary validates quarantine rate <5% before promoting to current/ prefix
  • Safe Publishing: Write-then-publish pattern with _SUCCESS markers ensures atomicity
  • Traceability: Execution ARN stored in _SUCCESS and _LATEST.json for end-to-end correlation

Related Documentation:


Scenario 2: Backfill with Late-Arriving Data

Description: Data arrives after its expected ingestion window (e.g., late CSV files for a previous month). System reprocesses affected partitions using run_id isolation to prevent overwriting existing data.

Trigger: Manual backfill request or automated detection of late-arriving data

Sequence Flow:

Key Points:

  • Run Isolation: Each backfill writes to unique run_id path (e.g., run_id=20260128_BACKFILL/), preserving historical runs
  • Partition Reprocessing: Only affected partitions are reprocessed (e.g., year=2024/month=03), not entire dataset
  • No Data Loss: Existing run_id paths remain immutable; promotion updates _LATEST.json pointer only
  • Audit Trail: All backfill runs preserved for compliance and debugging

Related Documentation:


Scenario 3: Schema Evolution Deployment

Description: Schema change requires new version (e.g., adding nullable column). System deploys schema_v=v2 alongside v1, enables dual-read compatibility, then gradually migrates consumers before deprecating v1.

Trigger: Schema change request (additive-only changes)

Sequence Flow:

Key Points:

  • Additive-Only Changes: New columns must be nullable to maintain backward compatibility
  • Versioned Paths: Schema versions stored in partition paths (schema_v=v1/, schema_v=v2/)
  • Dual-Read Period: Both v1 and v2 data queryable during migration window
  • Gradual Migration: Consumers update at their own pace; no forced downtime

Related Documentation:


Scenario 4: Quarantine Retry Workflow

Description: Invalid rows detected during validation are routed to Quarantine layer. After human review and correction, rows are retried. If retry succeeds, data moves to Silver. If max attempts exceeded, rows move to Condemned layer (no automatic retries).

Trigger: Validation failure during ETL processing

Sequence Flow:

Key Points:

  • Retry Logic: Max 3 attempts (attempt_count 0, 1, 2 allowed; attempt_count >= 3 condemned)
  • Human Review: Quality Team reviews quarantine data and coordinates corrections
  • Condemned Layer: Rows exceeding max attempts require explicit approval workflow for reprocessing
  • Audit Trail: All retry attempts preserved with attempt_count and retry_history metadata

Related Documentation:


Scenario 5: Failure Recovery and Rollback

Description: Deployment failure or ETL run failure triggers automated rollback. System reverts to previous known-good state, notifies platform team, and preserves failure context for debugging.

Trigger: Deployment smoke test failure or ETL run failure

Sequence Flow:

Key Points:

  • Automated Rollback: CD pipeline automatically reverts to previous known-good state on smoke test failure
  • Staging Pointer Pattern: _STAGING.json and _DEPLOYED.json enable precise rollback to exact previous build
  • Failure Notification: Step Functions failures trigger EventBridge → SNS → Platform Team alerts with full context
  • Preserved Context: Execution ARN, run_id, and error details preserved for debugging

Related Documentation:


See also

  • Data Lake Architecture - Medallion structure, error handling layers, and operational patterns
  • ETL Flow - Detailed ETL pipeline logic and validation
  • CI/CD Workflow - Deployment automation and infrastructure provisioning
  • Testing Guide - Validation and resilience testing
  • Runbooks - Operational procedures and troubleshooting guides s s
© 2026 Stephen AdeiCC BY 4.0