Task 4: CI/CD Workflow & Infrastructure
⚠️ Scope Note: This design demonstrates DevOps best practices but would require refinement based on actual Ohpen operations, tooling, and team maturity.
Task Deliverables
This document addresses the two required deliverables for Task 4:
✅ Deliverable 1: CI/CD Workflow Description
Location: Section 1: Pipeline Design (GitHub Actions) below
Content:
- Complete workflow description with stages (Validation, Artifact Build, Deployment)
- Workflow diagram (Mermaid flowchart)
- Backfill safety checks
- Failure handling scenarios
- Promotion workflow
✅ Deliverable 2: List of Necessary Artifacts
Location: Section 3: Deployment Artifacts below
Content: Complete list of all files required to deploy the solution (see table in Section 3).
1. Pipeline Design (GitHub Actions)
"History-Safe" CI/CD process supporting backfills and reprocessing with versioning and safe rollouts.
Workflow Stages
- Validation (CI): PR-triggered linting (
ruff) and unit tests (pytest) - Artifact Build: Package ETL code, tag with Git SHA (e.g.,
etl-v1.0.0-a1b2c3d.zip) - Deployment (CD): Upload to S3, Terraform plan/apply, update Glue Job
Key Safety Features
- Determinism: Same input → same output
- Partitioning: Correct
year=YYYY/month=MMmapping - Quarantine: Invalid rows preserved (never dropped)
- Failure Handling: Failed runs never update
_LATEST.jsonorcurrent/prefix - Human Approval: Required before promoting Silver layer data to production
See CI/CD Complete Reference for detailed failure handling.
2. Infrastructure as Code (Terraform)
Key Resources
- S3 Buckets:
raw,processed,quarantine,code-artifacts(versioning enabled, public access blocked) - IAM Roles: Least-privilege, prefix-scoped permissions (Bronze/Silver/Gold/Quarantine)
- AWS Glue Job: Python Shell/Spark job with S3 script path
- Step Functions: Orchestrates ETL runs with automatic retry (≤3 attempts, exponential backoff)
- EventBridge: Schedules daily ETL runs (default: 2 AM UTC, configurable cron)
- CloudWatch: Alarms for job failures and quarantine spikes
See CI/CD Complete Reference for detailed orchestration and permissions.
3. Deployment Artifacts
| Artifact | Description |
|---|---|
| ETL Implementation | Main ETL logic |
| Python Dependencies | Python dependencies |
| Terraform Infrastructure | Infrastructure definition |
| CI/CD Workflow | CI/CD pipeline definition |
| Configuration Template | Runtime config template |
4. Operational Monitoring
Key Metrics
- Volume:
input_rows,valid_rows_count,quarantined_rows_count,condemned_rows_count - Quality:
quarantine_rate,validation_failure_rate,error_type_distribution - Performance:
duration_seconds,rows_processed_per_run,missing_partitions
Alert Categories
- Infrastructure (P1): Job failures, circuit breaker triggers, runtime anomalies
- Data Quality (P2): Quarantine rate spikes (>1%), validation failures, high attempt counts
- Business (P3): Volume anomalies, SLA breaches
See CI/CD Complete Reference for complete metrics and alert ownership.
5. Ownership & Governance
Core Ownership
- Pipeline Infrastructure: Data Platform Team (CI/CD, Step Functions, EventBridge)
- AWS Infrastructure: Data Platform Team (S3, Glue, IAM, CloudWatch)
- Validation Rules: Domain Teams (Silver) / Business (Gold)
- Data Quality: Data Quality Team (quarantine review, quality metrics)
- Schema Changes: Domain Teams (Silver) / Business (Gold) approve; Platform Team implements
See CI/CD Complete Reference for complete ownership matrices, workflows, and rules.
Related Documentation
- CI/CD Complete Reference - Testing guides and workflow details (appendices A-D)
- Test Suite Summary - Test implementation details
- ETL Pipeline - What this CI/CD deploys
- Data Lake Architecture - Infrastructure this CI/CD provisions