© 2026 Stephen Adei. All rights reserved. All content on this site is the intellectual property of Stephen Adei. See License for terms of use and attribution.

ADR-001: Parquet-only Format (not Iceberg/Delta)

Status

Accepted

Context

The case study requirements specify storing validated data in Parquet format for analytical queries. The following options were considered:

Plain Parquet (chosen)
Apache Iceberg (table format with ACID transactions, time travel, partition evolution)
Delta Lake (similar to Iceberg, with ACID transactions and time travel)

The workload is batch OLAP (analytical processing), not real-time streaming. Month-end reporting queries can tolerate 30-second execution times.

Decision

Use Parquet-only format without Iceberg or Delta Lake table formats.

Rationale

Case requirements met: Parquet format satisfies all case study requirements
Simplicity: No additional table format layer reduces operational complexity
Batch workload: OLAP workload does not require ACID transactions or time travel
Schema evolution: Additive-only schema changes with schema_v versioning provides sufficient backward compatibility
Cost: No additional compute or storage overhead from table format metadata

Consequences

Positive:

Simpler architecture (no table format layer)
Lower operational complexity (no Iceberg/Delta metadata management)
Meets case study requirements without over-engineering
Faster development (no table format learning curve)

Negative:

No ACID transactions (not needed for batch OLAP)
No time travel queries (not required for case study)
No automatic partition evolution (manual schema_v versioning required)
No automatic small file compaction (manual optimization required)

Alternatives Considered

Apache Iceberg

Why rejected: Adds complexity without immediate benefit for batch OLAP workload. ACID transactions and time travel not required for month-end reporting.

Delta Lake

Why rejected: Similar to Iceberg, adds complexity without clear benefit. Requires Spark runtime, adds metadata overhead.

Plain Parquet with Schema Registry

Why rejected: Schema evolution via schema_v versioning in paths is sufficient. No need for separate schema registry infrastructure.

Design Decisions Summary - Complete trade-off analysis for this decision
ADR-002: Year/Month Partitioning - Partition strategy works with Parquet
ADR-004: Quarantine + Condemned Layers - Error handling layers use Parquet format

Implementation Evidence

Code: All ETL writes use Parquet format (write_parquet_to_s3 functions)
Documentation: Parquet Schema Specification - Schema contract
Architecture: Data Lake Architecture - Storage Format - Parquet format rationale

Status​

Context​

Decision​

Rationale​

Consequences​

Alternatives Considered​

Apache Iceberg​

Delta Lake​

Plain Parquet with Schema Registry​

Related Decisions​

Implementation Evidence​