© 2026 Stephen Adei. All rights reserved. All content on this site is the intellectual property of Stephen Adei. See License for terms of use and attribution.
SQL Complete Reference
This section contains all SQL-related reference materials: code, examples, and diagrams. For a high-level overview, see SQL Breakdown & Flow.
Table of Contents
Part 1: Analytical Query Implementation
This section contains the complete SQL implementation for generating month-end balance history.
File: balance_history_2024_q1.sql
/*
Month-End Balance History Aggregation
Goal: Show account balance history at the end of each month for Jan-Mar 2024.
Logic: Month-end balance is the last `new_balance` observed within a month (ordered by `tx_date`, tie-broken by `id`).
Carry-Forward Pattern: If an account has no transactions in a month, the last known balance is carried forward from the previous month (forward-fill).
Implementation: Uses `LAST_VALUE(...) IGNORE NULLS` to fill gaps in the account×month spine.
*/
-- 1. Generate the spine of all (Account, Month) combinations to ensure gaps are visible
WITH months AS (
SELECT CAST(date_column AS DATE) AS month_end
FROM (
VALUES
('2024-01-31'),
('2024-02-29'),
('2024-03-31')
) AS t (date_column)
),
unique_accounts AS (
SELECT DISTINCT account_id FROM transactions
-- WHERE tx_date >= '2024-01-01' -- Optimization: filter partition scan if possible
),
account_months AS (
SELECT
a.account_id,
m.month_end
FROM unique_accounts AS a
CROSS JOIN months AS m
),
-- 2. Find the last transaction for each account in each month
-- Efficient strategy: Window function partitioning by account/month
monthly_last_tx AS (
SELECT
account_id,
new_balance,
DATE_TRUNC('month', tx_date) AS tx_month_start,
ROW_NUMBER() OVER (
PARTITION BY account_id, DATE_TRUNC('month', tx_date)
ORDER BY tx_date DESC, id DESC -- id tie-breaker ensures deterministic result
) AS rn
FROM transactions
WHERE tx_date >= '2024-01-01' AND tx_date < '2024-04-01'
),
-- 3. Join spine with data and fill gaps (Carry-Forward)
raw_balances AS (
SELECT
am.account_id,
am.month_end,
tx.new_balance
FROM account_months AS am
LEFT JOIN monthly_last_tx AS tx
ON am.account_id = tx.account_id
AND DATE_TRUNC('month', am.month_end) = tx.tx_month_start
AND tx.rn = 1
)
-- 4. Apply forward-fill logic to carry over the last known balance
SELECT
account_id AS acct,
-- LAST_VALUE(IGNORE NULLS) fills NULLs with the most recent non-NULL value
LAST_VALUE(new_balance IGNORE NULLS) OVER (
PARTITION BY account_id
ORDER BY month_end
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS balance,
TO_CHAR(month_end, 'YYYY-MM') AS month
FROM raw_balances
ORDER BY account_id, month_end;
/*
Performance Considerations for 100M rows:
1. Partitioning: The 'transactions' table should be partitioned by tx_date (month/day).
This allows the query engine to prune partitions outside Jan-Mar 2024 immediately.
2. Parquet Optimization (Athena): Parquet column statistics (min/max) for account_id enable efficient filtering. Athena uses partition metadata and Parquet statistics to optimize operations without requiring explicit clustering or indexing.
This makes the 'DISTINCT account_id' and window function operations much faster (avoids heavy shuffle).
*/
/*
Query Explanation:
This query produces month-end balance history for Q1 2024 (January, February, March).
1. Month-end balance definition:
- The balance at the end of a month is defined as the last transaction's `new_balance`
observed within that month (ordered by `tx_date` descending, tie-broken by `id`).
2. Account×month spine with Carry-Forward:
- The query generates a complete spine of all (account, month) combinations using
CROSS JOIN to ensure all reporting months are present.
- If an account has no transactions in a month, the `LAST_VALUE(... IGNORE NULLS)`
function carries forward the balance from the most recent previous month.
- A `NULL` is only returned if an account has no transactions *up to and including*
that reporting month.
3. Window function approach:
- Uses ROW_NUMBER() partitioned by account and month to identify the last transaction.
- Efficient for large datasets (100M rows) as it avoids multiple scans.
4. Partition pruning:
- The WHERE clause filters to Jan-Mar 2024, enabling partition pruning on `tx_date`.
- This minimizes data scanned and improves query performance.
5. Engine portability:
- Uses standard ANSI/Presto SQL syntax.
- Compatible with Athena, BigQuery, Snowflake, and other modern SQL engines.
- Avoids engine-specific features for maximum portability.
Ownership & Governance (Task 3):
| Aspect | Owner | Responsibility |
| -------------------------------------------------------------------------------------- | ------------------ | ---------------------------------------------------- |
| **Gold Tables** | Business (Finance) | Analysts query Gold tables only for reporting |
| **Silver Tables** | Domain Teams | Technical users may query Silver for ad-hoc analysis |
| **Bronze Layer** Data Platform Team Not queried for analytics (raw audit trail only) |
Rules:
- Analysts query Gold tables only (business contracts, stable schemas).
- Silver tables used by technical users for ad-hoc analysis and debugging.
- Bronze is not queried for analytics (immutable raw data, platform team access only).
*/
See Also
Task 3 Documentation
- SQL Breakdown - Detailed query structure explanation with high-level algorithm
- SQL Breakdown & Flow - Visual representation and breakdown of query flow
- SQL Code Examples - SQL code snippets referenced in breakdown
- SQL Query - The actual SQL implementation and file reference
- Isolated Testing Guide - How to test the SQL query
Related Tasks
- Data Lake Architecture - Architecture this query runs on
- ETL Pipeline - What creates the data this query reads
Technical Documentation
- Testing Guide - Comprehensive testing documentation
- Unified Testing Convention - Testing standards
Part 2: SQL Code Examples
Breakdown: inner to outer
This section contains all SQL code snippets referenced in the SQL Breakdown document. These examples illustrate the query structure from inner to outer CTEs.
Base Table Filter
The base table filter is the innermost data source:
FROM transactions
WHERE tx_date >= '2024-01-01' AND tx_date < '2024-04-01'
Reference: See Base Table: transactions in SQL Breakdown.
CTE 1: months
Creates a hardcoded list of month-end dates for Q1 2024:
WITH months AS (
SELECT CAST(date_column AS DATE) AS month_end
FROM (
VALUES
('2024-01-31'),
('2024-02-29'),
('2024-03-31')
) AS t (date_column)
)
Reference: See CTE 1: months in SQL Breakdown.
CTE 2: unique_accounts
Extracts all unique account IDs from the transactions table:
unique_accounts AS (
SELECT DISTINCT account_id FROM transactions
-- WHERE tx_date >= '2024-01-01' -- Optimization: filter partition scan if possible
)
Reference: See CTE 2: unique_accounts in SQL Breakdown.
CTE 3: account_months
Creates a complete spine of all account×month combinations using CROSS JOIN:
account_months AS (
SELECT
a.account_id,
m.month_end
FROM unique_accounts AS a
CROSS JOIN months AS m
)
Reference: See CTE 3: account_months in SQL Breakdown.
CTE 4: monthly_last_tx
Identifies the last transaction for each account in each month using a window function:
monthly_last_tx AS (
SELECT
account_id,
new_balance,
DATE_TRUNC('month', tx_date) AS tx_month_start,
ROW_NUMBER() OVER (
PARTITION BY account_id, DATE_TRUNC('month', tx_date)
ORDER BY tx_date DESC, id DESC
) AS rn
FROM transactions
WHERE tx_date >= '2024-01-01' AND tx_date < '2024-04-01'
)
Reference: See CTE 4: monthly_last_tx in SQL Breakdown.
Final SELECT
The outermost SELECT statement that joins the spine with the last transactions:
-- Join spine with data and fill gaps (Carry-Forward)
WITH raw_balances AS (
SELECT
am.account_id,
am.month_end,
tx.new_balance
FROM account_months AS am
LEFT JOIN monthly_last_tx AS tx
ON am.account_id = tx.account_id
AND DATE_TRUNC('month', am.month_end) = tx.tx_month_start
AND tx.rn = 1
)
SELECT
account_id AS acct,
LAST_VALUE(new_balance IGNORE NULLS) OVER (
PARTITION BY account_id
ORDER BY month_end
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS balance,
TO_CHAR(month_end, 'YYYY-MM') AS month
FROM raw_balances
ORDER BY account_id, month_end;
Reference: See Final SELECT in SQL Breakdown.
Alternative Approaches (Not Used)
Alternative 1: Simple GROUP BY
This approach is incorrect because it does not show months with no transactions:
SELECT account_id, MAX(tx_date), new_balance
FROM transactions
GROUP BY account_id, MONTH(tx_date)
Reference: See Alternative 1: Simple GROUP BY (Wrong) in SQL Breakdown.
Alternative 2: SUM(amount)
This approach is incorrect per specification because it uses SUM instead of the last transaction's new_balance:
SELECT account_id, SUM(amount) as balance
FROM transactions
GROUP BY account_id, MONTH(tx_date)
Reference: See Alternative 2: SUM(amount) (Wrong per spec) in SQL Breakdown.
Gold Layer Storage Example
Example of how to store query results in the Gold layer:
-- Example: Store query results in Gold layer
CREATE TABLE gold.monthly_balances AS
SELECT
am.account_id AS acct,
tx.new_balance AS balance,
TO_CHAR(am.month_end, 'YYYY-MM') AS month,
CURRENT_TIMESTAMP AS as_of_timestamp
FROM account_months AS am
LEFT JOIN monthly_last_tx AS tx
ON am.account_id = tx.account_id
AND DATE_TRUNC('month', am.month_end) = tx.tx_month_start
AND tx.rn = 1
ORDER BY am.account_id, am.month_end;
Reference: See Storing Results in Gold Layer in SQL Breakdown.
See Also
- SQL Breakdown - Main documentation with detailed explanations
- SQL Implementation Code - Complete SQL query implementation
- SQL Breakdown & Flow - Visual representation and breakdown of query structure
Part 3: SQL Query Diagrams
This section contains all detailed diagrams for the SQL query breakdown. For a high-level overview, see SQL Breakdown & Flow.
High-Level Query Flow
Window Function Detail
Data Transformation Pipeline
Join Logic Detail
Example Data Flow
Performance Optimization Strategy
Output Structure
Error Handling & Edge Cases
CTE 4: Window Function Detailed Flow
Execution Order (Detailed)
See also
- SQL Breakdown & Flow - High-level overview (start here)
- Data Lake Architecture - Where this query reads data
- SQL Code - Complete SQL implementation
- ETL Pipeline - What creates the data this query reads
- Runtime Scenarios - Query execution in operational scenarios