Skip to main content

Data Pipeline — Technical Reference

End-to-end flow (top-down, with teams)

CI/CD: GitHub Actions (ci.yml, cd.yml) → Terraform. Alerting: CloudWatch 5% / 100 rows; Step failure → SNS. Dotted lines: owns / reviews / consumes. More: https://ohpen.stephenadei.nl

Organization name in bucket examples is illustrative; patterns are reusable across institutions.

Teams: Platform Domain Finance Quality

Technical Implementation Details

Bucket names in the table below are example deployment names; the same patterns apply for any institution.

ComponentTechnical Implementation Details
Architecture OverviewFlow: S3 (Bronze: raw CSV) → EventBridge → Step Functions → Glue/Lambda → S3 (Silver/Gold/Quarantine: Parquet) → Athena.
Separate buckets: ohpen-bronze, ohpen-silver, ohpen-gold, ohpen-quarantine, ohpen-artifacts.
1. Data IngestionSource: S3 ohpen-bronze, e.g. bronze/mortgages/transactions/ (ingest_date, run_id).
Format: CSV.
Trigger: S3 object-created → EventBridge → Step Functions → ETL; or schedule (e.g. daily 02:00 UTC).
2. Transformation LogicLanguage: Python (Pandas/PyArrow) or Glue PySpark.
Validation: Null checks on required columns; Currency vs ISO-4217.
Partitioning: Parquet by year/month from TransactionTimestamp (tx_date).
3. Data Lake Structureohpen-bronze (/raw): Immutable CSVs.
ohpen-silver (/processed): Validated Parquet (year/month, schema_v, run_id).
ohpen-gold: Business aggregates (account_balances, monthly_reports).
ohpen-quarantine: Quarantine (review/retry) + Condemned (audit; human approval).
ohpen-artifacts: ETL scripts, Athena results, manifests.
4. Schema EvolutionStrategy: Schema-on-Read. Catalog: Terraform-defined tables (no Crawler).
Handling: Additive-only; versioned paths (schema_v=v1, v2); catalog via IaC/runbook.
Parquet: missing columns → null.
5. CI/CD & MonitoringPipeline: GitHub Actions (ci.yml: lint, tests, build; cd.yml: deploy, Terraform apply).
Infra: Terraform (S3, IAM, Glue, Step Functions, EventBridge, Lambda, Athena, CloudWatch, SNS).
Alerting: Quarantine rate >5%, quarantined rows >100; Step failure → SNS.
© 2026 Stephen AdeiCC BY 4.0