Skip to main content

Lambda Promotion Implementation

Overview

Implemented Lambda-based automation for promoting Silver layer data after successful ETL completion. This addresses the gap between documented architecture (Gold layer promotion patterns) and actual implementation.

What Was Implemented

1. Lambda Function (promote_silver/lambda_function.py)

  • Purpose: Automatically promotes validated Silver data by:

    • Finding the most recent completed ETL run (via _SUCCESS marker)
    • Copying Parquet files to current/ prefix for stable Athena queries
    • Updating _LATEST.json with promotion metadata
  • Key Features:

    • When run_key is passed from Step Functions (from $$.Execution.Name), promotes that specific run; otherwise auto-discovers the most recent completed run
    • Handles schema versioning (schema_v=v1, etc.)
    • Preserves partition structure (year=YYYY/month=MM)
    • Error handling with detailed logging

2. Terraform Infrastructure

  • Lambda Function: aws_lambda_function.promote_silver

    • Runtime: Python 3.11
    • Timeout: 5 minutes
    • Memory: 512 MB
    • Auto-packaged from lambda/promote_silver/ directory
  • IAM Roles & Policies:

    • lambda_promote_silver_role: Lambda execution role with S3 access
    • step_functions_lambda: Policy allowing Step Functions to invoke Lambda
  • CloudWatch Logs: Automatic log group creation

3. Step Functions Integration

State machine flow:

CaptureRunContext → PrepareGlueInput (Choice: S3 trigger?) → WithS3Input / NoS3Input → RunETL (Glue, --run-key, --execution-arn, --input-bucket, --input-key) → ValidateOutput → PromoteSilver (Lambda) → Success

First state: CaptureRunContext captures execution ARN and run key ($$.Execution.Id, $$.Execution.Name). No separate run table.

PrepareGlueInput: When the execution is triggered by S3 (EventBridge), bucket and key are passed to Glue; otherwise a default bronze prefix is used.

PromoteSilver receives from ValidateOutput: bucket, prefix, schema version, run_key, glue_job_run_id, and execution_arn. The Lambda logs these for traceability and stores glue_job_run_id and execution_arn in _LATEST.json when present.

Architecture Flow

EventBridge → Step Functions → Glue Job (ETL)

Lambda (Promotion)

Update _LATEST.json + current/

Files Created

  1. infra/lambda/promote_silver/lambda_function.py - Lambda handler
  2. infra/lambda/promote_silver/requirements.txt - Dependencies (boto3)
  3. infra/lambda/promote_silver/README.md - Documentation
  4. infra/terraform/main.tf - Updated with Lambda resources

Deployment

Lambda is automatically deployed via Terraform:

  • Code packaged as ZIP during terraform apply
  • Function created/updated automatically
  • No manual deployment steps needed

Testing

Local Testing

cd tasks/devops_cicd/infra/lambda/promote_silver
python -m pytest # (if tests are added)

AWS Testing

aws lambda invoke \
--function-name ohpen-promote-silver \
--payload '{"silver_bucket":"ohpen-silver","silver_prefix":"silver/mortgages/transactions","schema_version":"v1"}' \
response.json

Integration Testing

Trigger Step Functions execution and verify:

  1. Lambda is invoked after Glue job completes
  2. Files are copied to current/ prefix
  3. _LATEST.json is updated

Benefits

Automated Promotion: No manual steps required
Stable Queries: current/ prefix provides stable Athena query paths
Audit Trail: _LATEST.json tracks promotion history
Error Handling: Integrated with Step Functions error handling
Cost-Effective: Lambda only runs after successful ETL (pay-per-use)

Next Steps (Optional Enhancements)

  1. Gold Layer Promotion: Similar Lambda for Gold layer promotion
  2. Backfill Support: Lambda to handle backfill promotion workflows
  3. Validation: Add pre-promotion validation (row counts, schema checks)
  4. Notifications: SNS notifications on promotion success/failure
  5. Rollback: Lambda to rollback promotion if issues detected
© 2026 Stephen AdeiCC BY 4.0