Security & CI/CD Strategy
Executive Summary
The Data Lake implements a Cloud-Native, Zero-Trust Deployment Architecture. The design moves beyond legacy "static access keys" to a modern OpenID Connect (OIDC) identity federation model. This ensures that the deployment pipeline (GitHub Actions) authenticates to the infrastructure (AWS) using temporary, strictly scoped credentials that exist only for the duration of a single job.
1. The Distinction: CI vs. CD
Continuous Integration (Validation) and Continuous Delivery (Deployment) are strictly separated to ensure stability and security.
Continuous Integration (CI) - "The Safety Neth"
- Goal: Validate code correctness before it ever touches production.
- Trigger: Every Pull Request.
- Security Profile: Read-Only / Mocked Environment.
- Actions:
- Unit Tests (PyTest)
- Code Quality Checks (Ruff, SQLFluff)
- Infrastructure Validation (
terraform plan) - No access to production data or keys.
Continuous Delivery (CD) - "The Gatekeeper"
- Goal: Safely release validated changes to production.
- Trigger: Merge to
mainbranch + Manual Approval. - Security Profile: Elevated Privileges (Scoped via OIDC).
- Actions:
- Authenticate via OIDC (Keyless).
- Wait for Human Approval (Senior Engineer/Tech Lead).
- Apply Infrastructure Changes (
terraform apply). - Deploy Glue Jobs & State Machines.
2. Keyless Authentication (OIDC)
In a traditional setup, an IAM User's "Access Key" (e.g., AKIA...) is stored in GitHub Secrets. If this key leaks, an attacker has permanent access until it is rotated.
Solution: OIDC Federation
- Trust, Don't Shared Secrets: AWS accepts a cryptographically signed token from GitHub only if it matches a specific organization/repository and branch.
- Temporary Credentials: The token is exchanged for temporary AWS credentials valid for only 60 minutes.
- Auditable: Every deployment action is logged in AWS CloudTrail as receiving a "Federated Login" from GitHub, providing a perfect audit trail.
Why this matters: The approach eliminates the #1 cause of cloud data breaches—leaked static API keys.
3. Least Privilege & Auditability
The security model extends beyond deployment into the runtime environment:
| Layer | Security Control | Business Value |
|---|---|---|
| Deployment | OIDC + GitOps | All infrastructure changes are version-controlled, code-reviewed, and deployed without static keys. |
| Data Lake | Immutable Bronze Layer | Raw data (bronze/) can be read but never overwritten or deleted, ensuring a legally defensible audit trail. |
| Runtime | Role Segregation | The ETL Job (Glue) cannot access Human Resources data; the Dashboard (QuickSight) cannot delete data. |
4. Compliance & Governance
This architecture supports regulatory compliance (GDPR, SOC2) by enforcing:
- No Human Access to Production Data: Developers interact with the pipeline, not the database.
- Four-Eyes Principle: Infrastructure changes require a Pull Request review + a Deployment Approval check.
- Full Traceability: Every change in AWS can be traced back to a specific Commit SHA and PR in GitHub.
5. Compliance & Controls
This platform is aligned to EU fintech regulatory expectations, addressing:
- GDPR – Privacy & security of processing
- DORA – Operational resilience (in force since 17 January 2025)
- SOC 1 / ISAE 3402 – Financial reporting control assurance
- ISO/IEC 27001 – Security management system backbone
- SOC 2 Type II – Customer trust for data platforms
- EBA outsourcing guidelines – AWS/vendor governance, critical/important classification, exit plans
- BCBS 239 – Data lineage, reconciliation, accuracy/completeness, reporting under stress
A full control matrix and third-party dependency register mapping these frameworks to architecture evidence is available in Compliance & Controls Framework (see below).
See also
- Security Architecture - IAM policies and trust definitions
- CI/CD Workflow - Automated deployment pipeline
- Architecture Boundaries - Design assumptions and edge case handling
- Compliance & Controls Framework - Structured control matrix and third-party register