© 2026 Stephen Adei. All rights reserved. All content on this site is the intellectual property of Stephen Adei. See License for terms of use and attribution.
Runtime Scenarios & Operational Flows
Overview
This document satisfies the arc42 Runtime View (Chapter 6) by synthesizing distributed scenario documentation into one hub page. It shows how the OLAP analytics pipeline behaves under various operational conditions: normal daily ingestion, backfills with late-arriving data, schema evolution deployments, quarantine retry workflows, and failure recovery scenarios. Source systems are upstream (Scope & Assumptions).
Each scenario includes a sequence diagram showing the interaction between actors, systems, and services, along with links to detailed documentation for implementation specifics.
Extended context: ETL Flow, Data Lake Architecture.
Scenario 1: Normal Daily Ingestion
Description: Standard daily ETL run triggered by scheduled EventBridge rule or S3 object creation event. Data flows from Bronze → Silver → Gold with promotion gate validation.
Trigger: EventBridge scheduled rule (daily 2 AM UTC) or S3 ObjectCreated event
Sequence Flow:
Key Points:
- Run Identity:
run_idderived from Step Functions execution name, propagated to all services - Promotion Gate: Lambda
read_run_summaryvalidates quarantine rate <5% before promoting tocurrent/prefix - Safe Publishing: Write-then-publish pattern with
_SUCCESSmarkers ensures atomicity - Traceability: Execution ARN stored in
_SUCCESSand_LATEST.jsonfor end-to-end correlation
Related Documentation:
- ETL Flow - Normal Processing - Detailed ETL logic
- CI/CD Workflow - Step Functions - Orchestration details
- Traceability Design - Run Identity - Run identity propagation
- Data Lake Architecture - Safe Publishing - Promotion gate pattern
Scenario 2: Backfill with Late-Arriving Data
Description: Data arrives after its expected ingestion window (e.g., late CSV files for a previous month). System reprocesses affected partitions using run_id isolation to prevent overwriting existing data.
Trigger: Manual backfill request or automated detection of late-arriving data
Sequence Flow:
Key Points:
- Run Isolation: Each backfill writes to unique
run_idpath (e.g.,run_id=20260128_BACKFILL/), preserving historical runs - Partition Reprocessing: Only affected partitions are reprocessed (e.g.,
year=2024/month=03), not entire dataset - No Data Loss: Existing
run_idpaths remain immutable; promotion updates_LATEST.jsonpointer only - Audit Trail: All backfill runs preserved for compliance and debugging
Related Documentation:
- Data Lake Architecture - Backfills - Backfill process details
- Governance Diagrams - Backfill Approval - Approval workflow
- ETL Flow - Run Identity - How run_id enables safe backfills
- Traceability Design - Run Isolation - Run identity patterns
Scenario 3: Schema Evolution Deployment
Description: Schema change requires new version (e.g., adding nullable column). System deploys schema_v=v2 alongside v1, enables dual-read compatibility, then gradually migrates consumers before deprecating v1.
Trigger: Schema change request (additive-only changes)
Sequence Flow:
Key Points:
- Additive-Only Changes: New columns must be nullable to maintain backward compatibility
- Versioned Paths: Schema versions stored in partition paths (
schema_v=v1/,schema_v=v2/) - Dual-Read Period: Both v1 and v2 data queryable during migration window
- Gradual Migration: Consumers update at their own pace; no forced downtime
Related Documentation:
- Data Lake Architecture - Schema Evolution - Schema evolution strategy
- CI/CD Workflow - Deployment - Infrastructure deployment
- Parquet Schema Specification - Schema contract details
- Data Model Design Rationale - Schema design principles
Scenario 4: Quarantine Retry Workflow
Description: Invalid rows detected during validation are routed to Quarantine layer. After human review and correction, rows are retried. If retry succeeds, data moves to Silver. If max attempts exceeded, rows move to Condemned layer (no automatic retries).
Trigger: Validation failure during ETL processing
Sequence Flow:
Key Points:
- Retry Logic: Max 3 attempts (attempt_count 0, 1, 2 allowed; attempt_count >= 3 condemned)
- Human Review: Quality Team reviews quarantine data and coordinates corrections
- Condemned Layer: Rows exceeding max attempts require explicit approval workflow for reprocessing
- Audit Trail: All retry attempts preserved with
attempt_countandretry_historymetadata
Related Documentation:
- Governance Diagrams - Quarantine Review - Approval workflow
- ETL Flow - Error Handling - Validation and quarantine logic
- Data Lake Architecture - Error Handling Layers - Quarantine and Condemned layers
- Design Decisions - Quarantine Design - Rationale for error handling layers
Scenario 5: Failure Recovery and Rollback
Description: Deployment failure or ETL run failure triggers automated rollback. System reverts to previous known-good state, notifies platform team, and preserves failure context for debugging.
Trigger: Deployment smoke test failure or ETL run failure
Sequence Flow:
Key Points:
- Automated Rollback: CD pipeline automatically reverts to previous known-good state on smoke test failure
- Staging Pointer Pattern:
_STAGING.jsonand_DEPLOYED.jsonenable precise rollback to exact previous build - Failure Notification: Step Functions failures trigger EventBridge → SNS → Platform Team alerts with full context
- Preserved Context: Execution ARN, run_id, and error details preserved for debugging
Related Documentation:
- CI/CD Workflow - Rollback - Automated rollback procedures
- System Architecture Overview - Failure Handling - Failure notification flow
- Traceability Design - Execution History - Reconstructing execution history
- Data Lake Architecture - Failure Mode Analysis - System resilience principles
See also
- Data Lake Architecture - Medallion structure, error handling layers, and operational patterns
- ETL Flow - Detailed ETL pipeline logic and validation
- CI/CD Workflow - Deployment automation and infrastructure provisioning
- Testing Guide - Validation and resilience testing
- Runbooks - Operational procedures and troubleshooting guides s s