© 2026 Stephen Adei. All rights reserved. All content on this site is the intellectual property of Stephen Adei. See License for terms of use and attribution.

Runtime Scenarios & Operational Flows

Overview

This document satisfies the arc42 Runtime View (Chapter 6) by synthesizing distributed scenario documentation into one hub page. It shows how the OLAP analytics pipeline behaves under various operational conditions: normal daily ingestion, backfills with late-arriving data, schema evolution deployments, quarantine retry workflows, and failure recovery scenarios. Source systems are upstream (Scope & Assumptions).

Each scenario includes a sequence diagram showing the interaction between actors, systems, and services, along with links to detailed documentation for implementation specifics.

Extended context: ETL Flow, Data Lake Architecture.

Scenario 1: Normal Daily Ingestion

Description: Standard daily ETL run triggered by scheduled EventBridge rule or S3 object creation event. Data flows from Bronze → Silver → Gold with promotion gate validation.

Trigger: EventBridge scheduled rule (daily 2 AM UTC) or S3 ObjectCreated event

Sequence Flow:

Key Points:

Run Identity: run_id derived from Step Functions execution name, propagated to all services
Promotion Gate: Lambda read_run_summary validates quarantine rate <5% before promoting to current/ prefix
Safe Publishing: Write-then-publish pattern with _SUCCESS markers ensures atomicity
Traceability: Execution ARN stored in _SUCCESS and _LATEST.json for end-to-end correlation

Related Documentation:

ETL Flow - Normal Processing - Detailed ETL logic
CI/CD Workflow - Step Functions - Orchestration details
Traceability Design - Run Identity - Run identity propagation
Data Lake Architecture - Safe Publishing - Promotion gate pattern

Scenario 2: Backfill with Late-Arriving Data

Description: Data arrives after its expected ingestion window (e.g., late CSV files for a previous month). System reprocesses affected partitions using run_id isolation to prevent overwriting existing data.

Trigger: Manual backfill request or automated detection of late-arriving data

Sequence Flow:

Key Points:

Run Isolation: Each backfill writes to unique run_id path (e.g., run_id=20260128_BACKFILL/), preserving historical runs
Partition Reprocessing: Only affected partitions are reprocessed (e.g., year=2024/month=03), not entire dataset
No Data Loss: Existing run_id paths remain immutable; promotion updates _LATEST.json pointer only
Audit Trail: All backfill runs preserved for compliance and debugging

Related Documentation:

Data Lake Architecture - Backfills - Backfill process details
Governance Diagrams - Backfill Approval - Approval workflow
ETL Flow - Run Identity - How run_id enables safe backfills
Traceability Design - Run Isolation - Run identity patterns

Scenario 3: Schema Evolution Deployment

Description: Schema change requires new version (e.g., adding nullable column). System deploys schema_v=v2 alongside v1, enables dual-read compatibility, then gradually migrates consumers before deprecating v1.

Trigger: Schema change request (additive-only changes)

Sequence Flow:

Key Points:

Additive-Only Changes: New columns must be nullable to maintain backward compatibility
Versioned Paths: Schema versions stored in partition paths (schema_v=v1/, schema_v=v2/)
Dual-Read Period: Both v1 and v2 data queryable during migration window
Gradual Migration: Consumers update at their own pace; no forced downtime

Related Documentation:

Data Lake Architecture - Schema Evolution - Schema evolution strategy
CI/CD Workflow - Deployment - Infrastructure deployment
Parquet Schema Specification - Schema contract details
Data Model Design Rationale - Schema design principles

Scenario 4: Quarantine Retry Workflow

Description: Invalid rows detected during validation are routed to Quarantine layer. After human review and correction, rows are retried. If retry succeeds, data moves to Silver. If max attempts exceeded, rows move to Condemned layer (no automatic retries).

Trigger: Validation failure during ETL processing

Sequence Flow:

Key Points:

Retry Logic: Max 3 attempts (attempt_count 0, 1, 2 allowed; attempt_count >= 3 condemned)
Human Review: Quality Team reviews quarantine data and coordinates corrections
Condemned Layer: Rows exceeding max attempts require explicit approval workflow for reprocessing
Audit Trail: All retry attempts preserved with attempt_count and retry_history metadata

Related Documentation:

Governance Diagrams - Quarantine Review - Approval workflow
ETL Flow - Error Handling - Validation and quarantine logic
Data Lake Architecture - Error Handling Layers - Quarantine and Condemned layers
Design Decisions - Quarantine Design - Rationale for error handling layers

Scenario 5: Failure Recovery and Rollback

Description: Deployment failure or ETL run failure triggers automated rollback. System reverts to previous known-good state, notifies platform team, and preserves failure context for debugging.

Trigger: Deployment smoke test failure or ETL run failure

Sequence Flow:

Key Points:

Automated Rollback: CD pipeline automatically reverts to previous known-good state on smoke test failure
Staging Pointer Pattern: _STAGING.json and _DEPLOYED.json enable precise rollback to exact previous build
Failure Notification: Step Functions failures trigger EventBridge → SNS → Platform Team alerts with full context
Preserved Context: Execution ARN, run_id, and error details preserved for debugging

Related Documentation:

CI/CD Workflow - Rollback - Automated rollback procedures
System Architecture Overview - Failure Handling - Failure notification flow
Traceability Design - Execution History - Reconstructing execution history
Data Lake Architecture - Failure Mode Analysis - System resilience principles

Overview​

Scenario 1: Normal Daily Ingestion​

Scenario 2: Backfill with Late-Arriving Data​

Scenario 3: Schema Evolution Deployment​

Scenario 4: Quarantine Retry Workflow​

Scenario 5: Failure Recovery and Rollback​

See also​

Overview

Scenario 1: Normal Daily Ingestion

Scenario 2: Backfill with Late-Arriving Data

Scenario 3: Schema Evolution Deployment

Scenario 4: Quarantine Retry Workflow

Scenario 5: Failure Recovery and Rollback

See also