Skip to main content

End-to-End System Architecture Overview

© 2026 Stephen Adei. All rights reserved. All content on this site is the intellectual property of Stephen Adei. See License for terms of use and attribution.

Scope & Assumptions

This solution is an OLAP analytics / data lake platform that delivers the assigned scope: CSV in S3 → validation → Parquet → Athena. Ohpen's OLTP core banking systems are upstream and out of scope; they are treated as systems of record. Data correctness, ACID guarantees, and transactional integrity are assumed at the source. This platform focuses on validation, auditability, analytics performance, and governance. Scope is limited to batch CSV-in-S3; CDC and event-driven ingestion are out of scope for this case.

This document provides a comprehensive view of the case study data lake platform (OLAP analytics layer), showing all components, data flows, integrations, and operational processes. The architecture is designed with AWS-first and managed-service principles in mind.

Complete System Architecture

Component Interactions

For runtime scenarios showing these interactions, see Runtime Scenarios. For ETL component interactions, see ETL Flow - Component Interaction.

1. Data Ingestion Flow

  1. Source: CSV files containing financial transaction data
  2. Ingestion: Files uploaded to Bronze S3 bucket (immutable, append-only)
  3. Trigger: EventBridge detects new objects or scheduled run (daily 2 AM UTC)
  4. Orchestration: Step Functions state machine starts ETL workflow

2. ETL Processing Flow

  1. Metadata Enrichment: Each row enriched with row_hash, source_file_id, attempt_count
  2. Loop Prevention: Duplicate detection, attempt limit enforcement, circuit breaker
  3. Validation: Schema validation, domain validation, business rules
  4. Transformation: Timestamp parsing, partitioning (year/month), data type conversion
  5. Storage: Valid data → Silver/Gold, Invalid data → Quarantine/Condemned

3. Data Promotion Flow

  1. Silver Promotion: Lambda function updates _LATEST.json and current/ prefix
  2. Catalog Registration: Glue Data Catalog registers new partitions
  3. Query Availability: Data becomes available for Athena queries

4. Query & Consumption Flow

  1. SQL Queries: Analysts query via Athena workgroup
  2. Partition Pruning: Athena only scans relevant partitions (95% cost reduction)
  3. BI Integration: BI tools consume aggregated data from Gold layer
  4. Reports: Finance team generates monthly reports

5. Monitoring & Alerting Flow

  1. Metrics Collection: CloudWatch collects metrics from all services
  2. Alarm Evaluation: CloudWatch alarms evaluate thresholds
  3. Failure Detection: Step Functions failures trigger EventBridge rules
  4. Notification: SNS topics publish alerts, SQS queues decouple consumers
  5. Platform Team: Receives alerts via email/Slack, processes via Lambda consumers

Observability Design: For detailed patterns on run identity propagation, metric dimensioning, and cross-service correlation, see Traceability Design.

6. Security & Compliance Flow

  1. Encryption: SSE-S3 for standard buckets, SSE-KMS for sensitive buckets
  2. Access Control: IAM roles enforce least-privilege access
  3. Audit Logging: CloudTrail logs all API calls (management + selective data events)
  4. Key Management: KMS CMK with automatic rotation for sensitive data

7. CI/CD Deployment Flow

  1. Code Changes: Developers commit code to GitHub
  2. CI Validation: Pull requests trigger tests, linting, Terraform plan
  3. CD Deployment: Merge to main triggers OIDC authentication, artifact build
  4. Manual Approval: GitHub environment gate requires manual approval
  5. Infrastructure Update: Terraform apply updates AWS resources
  6. Artifact Deployment: ETL scripts uploaded to S3 artifacts bucket

Data Flow Summary

For detailed ETL data flow, see ETL Flow. For SQL query flow, see SQL Breakdown.

Primary Flow (Success Path):

CSV Files → Bronze → EventBridge → Step Functions → Glue ETL → Silver → Lambda Promotion → Gold → Glue Catalog → Athena → Analysts

Error Handling Flow:

Glue ETL → Validation Errors → Quarantine (quarantine/) → (Retry) → Silver OR (Max Attempts) → Quarantine (quarantine/condemned/)

Failure notification flow

Failure Notification Flow:

Step Functions FAILED → EventBridge → SNS → SQS → Platform Team (Email/Slack/Lambda)

Monitoring Flow:

All Services → CloudWatch Metrics/Logs → Alarms → SNS → Platform Team

Key Architectural Principles

For complete architectural design, see Data Lake Architecture. For design rationale, see Design Decisions Summary.

  1. Medallion Architecture: Bronze (raw) → Silver (validated) → Gold (curated)
  2. Immutable Audit Trail: Bronze layer is append-only, all runs preserved
  3. Safe Publishing: Write-then-publish pattern with _SUCCESS markers
  4. Loop Prevention: Duplicate detection, attempt limits, circuit breaker
  5. Cost Optimization: Partition pruning, lifecycle policies, serverless model
  6. Security First: Encryption at rest, least-privilege IAM, comprehensive audit logging
  7. Observability: Comprehensive monitoring, alerting, and audit trails

See also

© 2026 Stephen AdeiCC BY 4.0