Case study guide — where to find each deliverable
This page helps reviewers and readers who have the Ohpen case study (Financial Data Pipeline and Data Lake Optimization) find where each required deliverable is addressed in this documentation.
The case has five tasks plus an appendix. Below, each task is listed with pointers to the relevant docs.
1. Data Ingestion and Transformation
Case asks for: Python script (read from S3, validate, write Parquet partitioned by year/month), plus edge cases and assumptions.
| Deliverable | Where to find it |
|---|---|
| ETL flow and implementation | Data ingestion & ETL — ETL flow |
| Edge cases and assumptions | Data ingestion — Assumptions and edge cases |
| Reference ETL code and diagrams | Reference — ETL diagrams, ETL pseudocode |
2. Data Lake Architecture Design
Case asks for: Folder structure for raw, processed, and aggregated data; strategy for schema evolution.
| Deliverable | Where to find it |
|---|---|
| Folder structure and architecture | Data lake architecture — Architecture |
| Schema evolution and assumptions | Data lake architecture — Assumptions and edge cases |
| Reference architecture and governance | Reference — Data lake architecture, Governance diagrams |
| ADRs (format, partitioning, etc.) | ADR 001 — Parquet format and remaining ADRs in sidebar under Architecture decisions (ADR) |
3. SQL
Case asks for: SQL query for account balance history at end of each month, per account, for the first three months of 2024 (example output given in the case). Appendix A defines the transactions table.
| Deliverable | Where to find it |
|---|---|
| SQL breakdown and implementation | Tasks — SQL — SQL breakdown, SQL implementation code |
| Assumptions and testing | Tasks — SQL — Assumptions and edge cases, Isolated testing |
| Reference SQL (complete and examples) | Reference — SQL, SQL examples |
4. DevOps Integration (CI/CD for Data Pipelines)
Case asks for: CI/CD workflow for the ETL pipeline (automated testing, automated infrastructure); list of artifacts.
| Deliverable | Where to find it |
|---|---|
| CI/CD workflow (main deliverable) | CI/CD Workflow — design, failure scenarios, orchestration, governance |
| Deployment and infrastructure | Tasks — DevOps & CI/CD — Deployment summary, Lambda implementation |
| Security and artifacts | Business case security, Terraform backend, CI/CD complete (artifacts) |
5. Communication and Documentation
Case asks for: Short email to non-technical stakeholders (metrics: records processed, errors); one-page technical document for the team.
| Deliverable | Where to find it |
|---|---|
| Example stakeholder email | Tasks — Communication — Stakeholder email, Stakeholder update (business) |
| Technical summary for the team | Tasks — Communication — Technical reference |
| High-level overview for reviewers | Executive summary (sidebar) |
Appendix A — Transaction table and sample data
The case defines a transactions table and sample rows used for the SQL task (e.g. balance history). The schema and logic are reflected in:
- Tasks — SQL (table semantics and query design)
- Reference — SQL code (implementation)
Evaluation criteria (case study)
The case states it evaluates: Technical skills (Python, SQL, cloud), Problem-solving (edge cases, scalability, performance), Clarity (explanations, documentation), Practicality (implementable solutions). The docs under Tasks, Reference, ADR, and Runbooks provide the evidence for these criteria.