ADR-002: Year/Month Partitioning Strategy
© 2026 Stephen Adei. All rights reserved. All content on this site is the intellectual property of Stephen Adei. See License for terms of use and attribution.
Status
Accepted
Context
The system needs to support efficient month-end reporting queries on large datasets (100M+ rows). Query patterns focus on specific time ranges (e.g., Q1 2024: January, February, March).
The following options were considered:
- Year/month partitioning (chosen)
- Year/month/day partitioning (more granular)
- Hash partitioning (even distribution)
- No partitioning (full table scans)
Decision
Partition Silver layer data by year and month (year=YYYY/month=MM) based on transaction timestamp (tx_date).
Rationale
- Query alignment: Month-end reporting queries filter by year and month, enabling 95% scan reduction
- Partition pruning: WHERE clauses like
year=2024 AND month IN ('01', '02', '03')scan only 3 partitions out of 60 (5-year retention) - Query performance: <30 seconds for 100M row queries with partition pruning
- Cost optimization: 95% reduction in Athena scan costs for Q1 queries
- Balance: Granular enough for query optimization, not too granular to cause small file problems
Consequences
Positive:
- 95% scan reduction for Q1 queries (3 partitions out of 60 scanned)
- Query performance: <30 seconds for 100M row queries
- Cost optimization: <$1 per query vs $20+ for full table scan
- Aligned with business reporting patterns (month-end reports)
Negative:
- Write amplification: Small files if low volume per month
- Manual partition management: Must ensure partition columns are extracted correctly
- Not optimal for day-level queries: Would require scanning entire month partition
Alternatives Considered
Year/Month/Day Partitioning
- Why rejected: Too granular for month-end reporting. Small files problem if low daily volume. Query patterns do not require day-level filtering.
Hash Partitioning
- Why rejected: Doesn't align with query patterns (time-based queries). No partition pruning benefit.
No Partitioning
- Why rejected: Full table scans too expensive (100M+ rows). Query performance unacceptable (>5 minutes).
Related Decisions
- Design Decisions Summary - Complete trade-off analysis for this decision
- ADR-001: Parquet-only Format - Parquet format enables partition pruning
- ADR-003: Serverless Architecture - Athena benefits from partition pruning
Implementation Evidence
- Code: ETL adds
yearandmonthpartition columns from transaction timestamp - Documentation: SQL Breakdown - Partition Pruning - Query optimization patterns
- Architecture: Data Lake Architecture - Partition Strategy - Partition design rationale