Skip to main content

Generative AI & Platform Opportunities

© 2026 Stephen Adei. All rights reserved.

🤖 GenAI callout — Amazon Bedrock is already used in four places (quarantine explanations, report narrative, catalog descriptions, SQL docs). Below: where it is used, plus 50 spots across the platform where AWS GenAI could add value — from validation hints to runbook drafts. All optional; the pipeline runs fine without it.

Scope: This applies to the case study OLAP analytics platform; Ohpen core banking (OLTP) is upstream and out of scope (Scope & Assumptions).


1. Is GenAI used in this business case?

Yes. The Ohpen case uses Amazon Bedrock (Claude) in four places:

UsePurposeWhere
Quarantine explanationsPlain-language explanation + suggested fix for quarantined/condemned rowstasks/data_ingestion_transformation/src/etl/bedrock_quarantine.py
Report narrative2–4 sentence stakeholder paragraph from run metricstasks/communication_documentation/scripts/bedrock_report_narrative.py
Catalog/quality descriptionsHuman-readable table description + quality summary for Glue Catalogtasks/data_ingestion_transformation/scripts/bedrock_quality_descriptions.py
SQL/pipeline docsMarkdown explanation of SQL or ETL flowtasks/communication_documentation/scripts/bedrock_sql_docs.py
  • Shared client: tasks/data_ingestion_transformation/src/etl/bedrock_client.py (Claude 3 Haiku via Bedrock Converse API).
  • IAM: Glue role has bedrock:InvokeModel / bedrock:InvokeModelWithResponseStream on arn:aws:bedrock:*::foundation-model/anthropic.claude-* (Terraform: tasks/devops_cicd/infra/terraform/main.tf).
  • All four uses degrade gracefully when Bedrock is unavailable (fallback text or clear error), so the pipeline does not depend on GenAI.

See BEDROCK_IMPLEMENTATIONS for details.


2. Places where (AWS) GenAI creates possibilities (limit 50)

Scoped to this data lake platform: ingestion, validation, quarantine, Silver/Gold, SQL, CI/CD, documentation, operations, and compliance.

Data quality & validation

  1. Auto-suggest validation rules — From sample Bronze data + domain glossary, suggest new validation rules (e.g. currency codes, date ranges) for validator.py.
  2. Anomaly explanations — When statistical checks flag outliers, generate short explanations (e.g. “Unusual spike in amount; possible bulk transfer”) for stewards.
  3. Schema drift summaries — Compare incoming schema to Glue table; generate a one-paragraph “what changed” for release notes.
  4. Duplicate/entity resolution — Use embeddings (Bedrock Titan) or LLM to propose match rules or explain why two records might be the same entity.
  5. Data quality narrative from metrics — Extend run metrics with a short “data health” narrative (e.g. completeness, freshness) for stakeholder reports.

Quarantine & error handling

  1. Quarantine root-cause grouping — Cluster quarantine rows by error type and generate a one-line “root cause” per cluster for triage.
  2. Retry prioritization text — For quarantine batches, generate “recommended order to fix” (e.g. fix currency first, then dates).
  3. Condemned summary report — Monthly summary of condemned rows with plain-language explanation of why they were excluded.
  4. Error-code-to-SLA mapping — From error codes and SLAs, generate “expected resolution time” text for ops dashboards.
  5. Quarantine notification body — Enrich SNS/Slack alerts with a 1–2 sentence summary of the failure (already adjacent to report narrative).

Catalog, lineage & governance

  1. Column-level descriptions — Generate Glue column descriptions from schema + sample values (extend bedrock_quality_descriptions.py).
  2. Lineage narrative — From Glue Crawler / Lake Formation lineage, generate “how this table was built” in 2–3 sentences.
  3. PII/classification labels — Suggest PII/sensitivity labels for columns from names and sample values; human review before applying.
  4. Data contract summaries — From schema + validation rules, generate a one-page “contract” (expected format, constraints) for producers.
  5. Glue table change log — When partition keys or columns change, generate a short “what changed and why” for governance docs.

SQL & analytics

  1. Query explanation in natural language — For any Athena query (e.g. balance_history_2024_q1.sql), generate “what this query does” (extend bedrock_sql_docs.py).
  2. NL-to-SQL for Gold layer — “Show me month-end balances for Q1 2024” → generated SQL over Gold tables with guardrails (e.g. partition filters).
  3. Query optimization hints — From query + table stats, suggest “add partition filter” or “avoid SELECT *” with short explanation.
  4. Report commentary — For a given report (e.g. balance history), generate 1–2 sentences of commentary on trends or anomalies.
  5. Ad-hoc query summaries — Log ad-hoc Athena queries; periodically generate “what analysts are asking” summary for product/ops.

ETL & pipeline

  1. ETL step documentation — Per Glue job / Step Functions state, generate one sentence “what this step does” for runbooks.
  2. Backfill playbook text — From run_id and partition list, generate “backfill scope and steps” for ops.
  3. Failure post-mortem draft — When a run fails, from logs + state, generate a one-page post-mortem draft (timeline, cause, next steps).
  4. Idempotency / rerun explanation — Generate “what happens if this run_id is rerun” from orchestration and job design.
  5. Cost attribution narrative — From Glue/Athena cost tags, generate “this month’s spend by job/query” in plain language.

CI/CD & infrastructure

  1. Terraform change summary — On Terraform plan, generate “what will change” in 2–3 sentences for PR review.
  2. Security finding explanation — For IAM/security scanner findings, generate “why this matters and how to fix” for developers.
  3. Rollback impact summary — Before rollback, generate “which tables/jobs are affected” from state and dependencies.
  4. Pipeline dependency diagram captions — For Mermaid/architecture diagrams, generate short captions (e.g. “Glue depends on S3 and Catalog”).
  5. Release notes draft — From git diff + task list, generate a release-notes paragraph for the docs site.

Documentation & communication

  1. Runbook step expansion — For each runbook step, generate “why this step is done” and “what to check” for new joiners.
  2. Stakeholder email draft — From run metrics + narrative, generate a short email body (extend report narrative).
  3. Glossary definitions — From term list + codebase, suggest or refine definitions for docs/GLOSSARY.md.
  4. ADR summaries — One-paragraph summary of each ADR for an “ADR index” page.
  5. API/script usage text — From argparse and docstrings, generate “how to use this script” for README or docs.

Operations & observability

  1. Alarm runbook suggestion — When CloudWatch alarm fires, generate “likely causes and first checks” from alarm name and metrics.
  2. Log summarization — For a failed Glue run, summarize last N log lines into 2–3 sentences for tickets.
  3. SLA breach explanation — When a run misses SLA, generate “why the SLA was missed” from latency breakdown and errors.
  4. Capacity planning narrative — From DPU/scan trends, generate “recommendation” (e.g. “consider more workers for backfill”).
  5. Incident timeline narrative — From timestamps and events, generate a short incident timeline for post-mortems.

Compliance & audit

  1. Access review summary — From IAM/Lake Formation changes, generate “who got access to what” summary for compliance.
  2. Data retention explanation — Per bucket/layer, generate “why this retention period” aligned to policy (e.g. 7 years for Gold).
  3. Audit finding response draft — From finding text, generate a first-draft response (acknowledgment, root cause, remediation).
  4. Cross-border transfer summary — From bucket region and table list, generate “data stored where” for DPA/DPIA.
  5. Quarantine retention justification — Short “why quarantine/condemned are retained” for auditors (link to policy).

Discovery & self-service

  1. “What can be queried?” answer — From Glue Catalog, generate a short list of “available tables and what they contain” for new analysts.
  2. Sample query suggestions — Per Gold table, suggest 1–2 example queries with partition filters.
  3. Onboarding checklist narrative — From docs and scripts, generate “day-one checklist” for new data engineers.
  4. Cost driver FAQ — From cost tags and usage, generate “why did this job cost X?” Q&A for FinOps.
  5. Platform health one-liner — From key metrics (run success, quarantine rate, latency), generate a single-sentence “platform health” for dashboards.

Summary

  • GenAI in the case: Yes — Bedrock (Claude) is used for quarantine explanations, report narrative, catalog/quality descriptions, and SQL/pipeline docs; all with fallbacks.
  • Opportunities: The list above gives 50 concrete places where AWS GenAI (Bedrock, and where relevant SageMaker/other AWS ML) can add value across data quality, quarantine, catalog, SQL, ETL, CI/CD, docs, operations, and compliance, without changing the core architecture.
© 2026 Stephen AdeiCC BY 4.0