Skip to main content

Task 4: CI/CD Workflow & Infrastructure

⚠️ Scope Note: This design demonstrates DevOps best practices but would require refinement based on actual Ohpen operations, tooling, and team maturity.


Task Deliverables

This document addresses the two required deliverables for Task 4:

✅ Deliverable 1: CI/CD Workflow Description

Location: Section 1: Pipeline Design (GitHub Actions) below

Content:

  • Complete workflow description with stages (Validation, Artifact Build, Deployment)
  • Workflow diagram (Mermaid flowchart)
  • Backfill safety checks
  • Failure handling scenarios
  • Promotion workflow

✅ Deliverable 2: List of Necessary Artifacts

Location: Section 3: Deployment Artifacts below

Content: Complete list of all files required to deploy the solution (see table in Section 3).


1. Pipeline Design (GitHub Actions)

"History-Safe" CI/CD process supporting backfills and reprocessing with versioning and safe rollouts.

Workflow Stages

  1. Validation (CI): PR-triggered linting (ruff) and unit tests (pytest)
  2. Artifact Build: Package ETL code, tag with Git SHA (e.g., etl-v1.0.0-a1b2c3d.zip)
  3. Deployment (CD): Upload to S3, Terraform plan/apply, update Glue Job

Key Safety Features

  • Determinism: Same input → same output
  • Partitioning: Correct year=YYYY/month=MM mapping
  • Quarantine: Invalid rows preserved (never dropped)
  • Failure Handling: Failed runs never update _LATEST.json or current/ prefix
  • Human Approval: Required before promoting Silver layer data to production

See CI/CD Complete Reference for detailed failure handling.


2. Infrastructure as Code (Terraform)

Key Resources

  • S3 Buckets: raw, processed, quarantine, code-artifacts (versioning enabled, public access blocked)
  • IAM Roles: Least-privilege, prefix-scoped permissions (Bronze/Silver/Gold/Quarantine)
  • AWS Glue Job: Python Shell/Spark job with S3 script path
  • Step Functions: Orchestrates ETL runs with automatic retry (≤3 attempts, exponential backoff)
  • EventBridge: Schedules daily ETL runs (default: 2 AM UTC, configurable cron)
  • CloudWatch: Alarms for job failures and quarantine spikes

See CI/CD Complete Reference for detailed orchestration and permissions.


3. Deployment Artifacts

ArtifactDescription
ETL ImplementationMain ETL logic
Python DependenciesPython dependencies
Terraform InfrastructureInfrastructure definition
CI/CD WorkflowCI/CD pipeline definition
Configuration TemplateRuntime config template

4. Operational Monitoring

Key Metrics

  • Volume: input_rows, valid_rows_count, quarantined_rows_count, condemned_rows_count
  • Quality: quarantine_rate, validation_failure_rate, error_type_distribution
  • Performance: duration_seconds, rows_processed_per_run, missing_partitions

Alert Categories

  • Infrastructure (P1): Job failures, circuit breaker triggers, runtime anomalies
  • Data Quality (P2): Quarantine rate spikes (>1%), validation failures, high attempt counts
  • Business (P3): Volume anomalies, SLA breaches

See CI/CD Complete Reference for complete metrics and alert ownership.


5. Ownership & Governance

Core Ownership

  • Pipeline Infrastructure: Data Platform Team (CI/CD, Step Functions, EventBridge)
  • AWS Infrastructure: Data Platform Team (S3, Glue, IAM, CloudWatch)
  • Validation Rules: Domain Teams (Silver) / Business (Gold)
  • Data Quality: Data Quality Team (quarantine review, quality metrics)
  • Schema Changes: Domain Teams (Silver) / Business (Gold) approve; Platform Team implements

See CI/CD Complete Reference for complete ownership matrices, workflows, and rules.


© 2026 Stephen AdeiCC BY 4.0