Skip to main content

© 2026 Stephen Adei. All rights reserved. All content on this site is the intellectual property of Stephen Adei. See License for terms of use and attribution.

CloudTrail, SNS & SQS — Considerations

CloudTrail

CloudTrail

  • Management events: Enabled for all read/write (who changed Glue, S3, IAM, Step Functions). Sufficient for most audit/compliance; low cost.
  • Data events: Selectively enabled for high-risk buckets (ohpen-gold and ohpen-quarantine) to track S3 object-level reads/writes. This provides comprehensive auditability for sensitive financial data while managing cost by limiting to critical buckets.
  • Log delivery: Dedicated S3 bucket (ohpen-cloudtrail-logs); lifecycle/retention per policy. Log file validation enabled for integrity.

CloudTrail Selective Data Events Configuration:

Who gets alerts (analysts vs platform)

  • Analysts consume data (Silver/Gold) via Athena and dashboards. They do not need run-level alerts (no email/Slack on every success or failure).
  • Platform / Data Quality need alerts:
    • ETL failures → SNS ohpen-etl-failures → Platform team (add email/Slack subscription in Console or Terraform) and/or SQS for Lambda (ticketing, logging).
    • Quarantine spikes → SNS ohpen-quarantine-alerts → Data Quality team (wire when post-ETL check or Lambda publishes to this topic).

SNS/SQS usage

  • SNS: One topic per signal (failures, quarantine). Add email or Slack subscriptions for humans; SQS subscription for automated consumers (Lambda).
  • SQS: Used for decoupling and retries. ohpen-etl-failures queue receives failure events from SNS; Lambda can consume and create tickets or post to Slack. DLQ (ohpen-etl-failures-dlq) holds poison messages after 3 receives for inspection.
  • Success: No SNS for successful runs; analysts use Athena/dashboards. Optional: future Lambda could write run metadata to DynamoDB without alerting analysts.

Terraform

  • CloudTrail: aws_cloudtrail.main, aws_s3_bucket.cloudtrail_logs (see CI/CD Workflow infrastructure section).
  • SNS: aws_sns_topic.etl_failures, aws_sns_topic.quarantine_alerts (see Tooling & Controls).
  • SQS: aws_sqs_queue.etl_failures, aws_sqs_queue.etl_failures_dlq; subscription from SNS to SQS (see System Architecture Overview for event flow).
  • EventBridge: Rule ohpen-step-functions-failed (Step Functions execution FAILED → SNS); no change to the Step Function state machine (see CI/CD Workflow for orchestration details).

See also

© 2026 Stephen AdeiCC BY 4.0