Skip to main content

© 2026 Stephen Adei. All rights reserved. All content on this site is the intellectual property of Stephen Adei. See License for terms of use and attribution.

ADR-001: Parquet-only Format (not Iceberg/Delta)

Status

Accepted

Context

The case study requirements specify storing validated data in Parquet format for analytical queries. The following options were considered:

  1. Plain Parquet (chosen)
  2. Apache Iceberg (table format with ACID transactions, time travel, partition evolution)
  3. Delta Lake (similar to Iceberg, with ACID transactions and time travel)

The workload is batch OLAP (analytical processing), not real-time streaming. Month-end reporting queries can tolerate 30-second execution times.

Decision

Use Parquet-only format without Iceberg or Delta Lake table formats.

Rationale

  1. Case requirements met: Parquet format satisfies all case study requirements
  2. Simplicity: No additional table format layer reduces operational complexity
  3. Batch workload: OLAP workload does not require ACID transactions or time travel
  4. Schema evolution: Additive-only schema changes with schema_v versioning provides sufficient backward compatibility
  5. Cost: No additional compute or storage overhead from table format metadata

Consequences

Positive:

  • Simpler architecture (no table format layer)
  • Lower operational complexity (no Iceberg/Delta metadata management)
  • Meets case study requirements without over-engineering
  • Faster development (no table format learning curve)

Negative:

  • No ACID transactions (not needed for batch OLAP)
  • No time travel queries (not required for case study)
  • No automatic partition evolution (manual schema_v versioning required)
  • No automatic small file compaction (manual optimization required)

Alternatives Considered

Apache Iceberg

  • Why rejected: Adds complexity without immediate benefit for batch OLAP workload. ACID transactions and time travel not required for month-end reporting.

Delta Lake

  • Why rejected: Similar to Iceberg, adds complexity without clear benefit. Requires Spark runtime, adds metadata overhead.

Plain Parquet with Schema Registry

  • Why rejected: Schema evolution via schema_v versioning in paths is sufficient. No need for separate schema registry infrastructure.

Implementation Evidence

© 2026 Stephen AdeiCC BY 4.0