Skip to main content

Executive Summary

© 2026 Stephen Adei. All rights reserved.
This work is protected by copyright. For permission requests or inquiries, please contact the author.
View full license at LICENSE.md. Note: This business case package contains design documentation, architecture, SQL queries, and communication templates. Full implementation code (Python ETL, Terraform, CI/CD workflows) is available in a separate code package upon request. Scripts repository (Gitea): see Repository & scripts (Gitea). Full narrative and diagrams: this site.

Scope & Assumptions

This solution is an OLAP analytics / data lake platform that delivers the assigned scope: CSV in S3 → validation → Parquet → Athena. Ohpen's OLTP core banking systems are upstream and out of scope; they are treated as systems of record. Data correctness, ACID guarantees, and transactional integrity are assumed at the source. This platform focuses on validation, auditability, analytics performance, and governance. Scope is limited to batch CSV-in-S3; CDC and event-driven ingestion are out of scope for this case. The organization name (e.g. Ohpen) is a deployment context only; the same design applies to any institution (e.g. ING Bronze/Silver/Gold).

Business Case

Build a production-ready FinTech data platform that not only meets the stated requirements (ETL, architecture, SQL, CI/CD) but also addresses the unspoken operational challenges that distinguish a working demo from a platform you'd trust with customer financial data:

  • Operational Safety: What happens when validation fails? (Quarantine/Condemned layers with perpetual audit trail)
  • Cost Predictability: What prevents runaway query costs? (Partition pruning 95% reduction, lifecycle policies, circuit breakers)
  • Security Posture: How to prevent credential leakage (OIDC keyless auth, two-role CI/CD separation)
  • Audit Compliance: How to trace a single transaction through all systems (Run identity via Step Functions ARN)
  • Production Readiness: What's the rollback plan if deployment fails? (Staging pointer, immutable artifacts, automated rollback)

Every architectural decision explicitly considers cost, risk, compliance, and operational burden — not just "does it work?"

Five Core Requirements

(Business Case 1.–4.) Python ETL, Data Lake Architecture, SQL Analytics (100M rows), CI/CD Automation

Solution Overview

Architecture Highlights:

GenAI: Amazon Bedrock is used in four places (quarantine explanations, report narrative, catalog descriptions, SQL docs). Additional opportunities are described in GenAI in the Ohpen Case & Opportunities.

Key Deliverables

This solution delivers all business case requirements with production-grade implementations:

◆ 1. Python ETL Pipeline (1. Data Ingestion & Transformation)

  • Dual implementation strategy: Pandas (< 10M records, fast iteration) + PySpark (10M+ records, production scalability) — see ETL Flow for runtime (Lambda vs Glue)
  • Validation engine: Null checks, currency codes, amount type validation, timestamp parsing
  • Error handling: Quarantine (retriable failures) + Condemned (terminal failures, circuit breaker)
  • Production features: Loop prevention, idempotent runs, CloudWatch metrics
  • Location: ETL Flow & Scripts, ETL Code

◆ 2. Data Lake Architecture (2. Architecture Design)

  • Medallion layers: Bronze (raw audit trail) → Silver (validated analytics) → Gold (business contracts)
  • Transformation cardinality: Bronze → Silver (1:1), Silver → Gold (1:N) — see cardinality diagram below and Data Lake Architecture
  • Error handling layers: Quarantine (retriable failures) + Condemned (terminal failures, perpetual audit)
  • Governance model: Clear ownership boundaries (Platform → Domain → Business)
  • Schema evolution: Additive-only, versioned paths, forward compatibility
  • Location: Data Lake Architecture (8 comprehensive sections)

Cardinality Model

Cardinality Model:

Rationale:

  • 1:1 (Bronze → Silver): Single source of truth produces one validated dataset
  • 1:N (Silver → Gold): Different business needs create multiple aggregations from same validated source
  • Benefits: Data consistency, clear lineage, scalability (add Gold views without changing Silver)

◆ 3. SQL Solution (3. Analytical Querying, 100M rows)

  • Query: Month-end balance history for Q1 2024 with carry-forward logic
  • Optimization: Partition pruning (95% scan reduction), window functions, columnar format
  • Scalability: 100M-row tables queried in ~30 seconds (Athena scans 5M rows after pruning)
  • Production pattern: Gold layer aggregation (1:N from Silver)
  • Location: SQL Breakdown, SQL Implementation Code

◆ 4. CI/CD Automation (4. DevOps & Deployment)

  • Security-first: OIDC keyless auth, two-role separation (CI-stages, CD-deploys), manual approval gates
  • Staging pointer: CI builds → staged artifact → CD deploys exact build (no divergence)
  • Infrastructure-as-code: Terraform (S3, Glue, Step Functions, EventBridge, IAM, CloudWatch)
  • Operational resilience: Automated rollback, smoke tests, CloudTrail audit
  • Location: CI/CD Workflow, CI/CD Artifacts (Terraform + GitHub Actions workflows)

Promotion Gate: Quality Control Before Production

What is the promotion gate?

The promotion gate is a quality checkpoint that prevents bad data from reaching production reporting. After each ETL run completes, a Lambda function (read_run_summary) evaluates the run's quality metrics against pre-defined thresholds.

Promotion criteria (all must pass):

  1. Quarantine rate < 5% (< 5% of rows are invalid)
  2. Condemned rows < 100 (< 100 rows permanently excluded)
  3. No critical errors (circuit breaker not triggered)

What happens on promotion:

  • Pass: Lambda promote_silver updates _LATEST.json and current/ prefix → data available for reporting
  • Fail: Run remains in staging (run_id-specific path) → data NOT promoted, alerts sent to platform team

Why this matters:

  • Prevents bad data in reports: Finance/BI queries use current/ prefix, which only contains promoted (quality-assured) runs
  • Auditability: All runs are preserved (run_id isolation), but only promoted runs are production-ready
  • Clear responsibility: Platform team fixes failed runs, promotes manually after validation

Location: Lambda read_run_summary (criteria evaluation), Lambda promote_silver (promotion execution) — see ETL Flow and Data Lake Architecture - Safe Publishing

Design Decisions & Architecture Trade-offs

For detailed analysis of key architectural decisions, trade-offs, risks, and AWS Well-Architected Framework mapping, see:

Design Decisions Summary - Why these choices were made, what alternatives were considered, and how they align with FinTech requirements.

Where to Find Details

Implementation Code

  • ETL Pipeline: Full Python implementations (Pandas + PySpark) available in the code package
  • Infrastructure: Complete Terraform configuration and GitHub Actions workflows available in the code package

Documentation

Supporting Documentation


See also

© 2026 Stephen AdeiCC BY 4.0