© 2026 Stephen Adei. All rights reserved. All content on this site is the intellectual property of Stephen Adei. See License for terms of use and attribution.
Data Lake Architecture (Task 2 — High-Level Overview)
This document gives the high-level data lake design. For folder structure, schema evolution, safe publishing, failure modes, ownership, and runbooks, see the Data Lake Architecture Details reference.
Design at a glance
- Medallion architecture: Bronze (raw) → Silver (validated) → Gold (business aggregates), with dedicated S3 buckets per layer.
- Error handling: Quarantine (invalid rows, retry tracking) and Condemned (max attempts exceeded, human approval for reprocessing).
- Partitioning: Bronze
ingest_date; Silveryear/month; Goldas_of_month. - Safe publishing:
run_idisolation,_SUCCESSmarkers,_LATEST.jsonandcurrent/for stable consumption. - Governance: Schema versioning (
schema_v), additive-only evolution, ownership per layer (Platform, Domain, Business).
High-level flow
| Layer | Purpose | Partition | Format |
|---|---|---|---|
| Bronze | Raw, immutable audit trail | ingest_date | CSV.gz |
| Silver | Validated, quality-assured | year/month | Parquet |
| Gold | Business aggregates, reporting | as_of_month | Parquet |
| Quarantine | Invalid rows, retry metadata | ingest_date | Parquet |
| Condemned | No further retries; human approval | under quarantine | Parquet |
Key design decisions
- 1:1 Bronze → Silver: One raw source produces one validated dataset.
- 1:N Silver → Gold: One Silver dataset feeds multiple Gold aggregations (e.g. account_balances, monthly_reports).
- Run isolation: Every run uses a unique
run_id; no overwrites; promotion via_LATEST.jsonandcurrent/after validation. - Schema evolution: Additive-only, versioned paths (
schema_v=v1,v2); Parquet-only today; Iceberg optional later.
Where to go next
- Data Lake Architecture Details — Full reference: folder structure, schema evolution, safe publishing, failure mode analysis, ownership, governance, performance, security, runbooks.
- Architecture Boundaries — Assumptions and edge cases for this design.
- Governance & Workflow Diagrams — Approval and responsibility matrices.