Analytical Query Implementation
© 2026 Stephen Adei. All rights reserved. All content on this site is the intellectual property of Stephen Adei. See License for terms of use and attribution.
This page displays the SQL query for account balance history.
Balance History Query
The main SQL query that shows account balance history at the end of each month for Q1 2024:
/*
Month-End Balance History (Silver → Gold Aggregation Pattern)
Goal: Show account balance history at the end of each month for Jan-Mar 2024.
History Logic: Month-end balance is the last `new_balance` observed within a month
(ordered by `tx_date`, tie-broken by `id`).
Carry-forward: months with no transactions show the last known balance; NULL only when the account has no prior history.
*/
-- 1. Generate the spine of all (Account, Month) combinations to ensure gaps are visible
WITH months AS (
SELECT CAST(date_column AS DATE) AS month_end
FROM (
VALUES
('2024-01-31'),
('2024-02-29'),
('2024-03-31')
) AS t (date_column)
),
unique_accounts AS (
SELECT DISTINCT account_id FROM transactions
-- WHERE tx_date >= '2024-01-01' -- Optimization: filter partition scan if possible
),
account_months AS (
SELECT
a.account_id,
m.month_end
FROM unique_accounts AS a
CROSS JOIN months AS m
),
-- 2. Find the last transaction for each account in each month
-- Efficient strategy: Window function partitioning by account/month
monthly_last_tx AS (
SELECT
account_id,
new_balance,
DATE_TRUNC('month', tx_date) AS tx_month_start,
ROW_NUMBER() OVER (
PARTITION BY account_id, DATE_TRUNC('month', tx_date)
ORDER BY tx_date DESC, id DESC -- id tie-breaker ensures deterministic result
) AS rn
FROM transactions
WHERE tx_date >= '2024-01-01' AND tx_date < '2024-04-01'
)
-- 3. Join spine with data to produce final report
SELECT
am.account_id AS acct,
tx.new_balance AS balance,
TO_CHAR(am.month_end, 'YYYY-MM') AS month
FROM account_months AS am
LEFT JOIN monthly_last_tx AS tx
ON
am.account_id = tx.account_id
AND DATE_TRUNC('month', am.month_end) = tx.tx_month_start
AND tx.rn = 1
ORDER BY am.account_id, am.month_end;
Note: The production file balance_history_2024_q1.sql adds a raw_balances CTE and forward-fill via LAST_VALUE(am.new_balance IGNORE NULLS) for carry-forward balances; see SQL Breakdown for the full query and rationale.
Schema Definition
The schema used by the query:
-- Schema for transactions table
-- This schema is used by the balance history query
-- Mapping: TransactionID → id, CustomerID → account_id, TransactionTimestamp → tx_date
CREATE TABLE transactions (
id VARCHAR(255),
account_id VARCHAR(255),
amount DECIMAL(16,2),
new_balance DECIMAL(16,2),
tx_date TIMESTAMP
)
PARTITIONED BY (
year INT,
month INT
)
STORED AS PARQUET;
Query Explanation
This query produces month-end balance history for Q1 2024 (January, February, March):
-
Month-end balance definition: The balance at the end of a month is defined as the last transaction's
new_balanceobserved within that month (ordered bytx_datedescending, tie-broken byid). -
Account×month spine: The query generates a complete spine of all (account, month) combinations using CROSS JOIN to ensure gaps are visible. Accounts with no activity in a month return NULL for balance (explicit gaps).
-
Window function approach: Uses ROW_NUMBER() partitioned by account and month to identify the last transaction. Efficient for large datasets (100M rows) as it avoids multiple scans.
-
Partition pruning: The WHERE clause filters to Jan-Mar 2024, enabling partition pruning on
tx_date. This minimizes data scanned and improves query performance. -
Engine portability: Uses standard ANSI/Presto SQL syntax. Compatible with Athena, BigQuery, Snowflake, and other modern SQL engines.
Performance Considerations
For 100M rows (Athena):
-
Partitioning: The 'transactions' table should be partitioned by
tx_date(month/day). This allows Athena to prune partitions outside Jan-Mar 2024 immediately, reducing data scanned. -
Parquet Optimization: Parquet column statistics (min/max) for
account_idandtx_dateenable efficient filtering. Athena uses these statistics along with partition metadata to optimize DISTINCT and window function operations without requiring explicit clustering or indexing.
See also
- SQL Query Breakdown - Detailed query explanation and design rationale (parent hub)
- SQL Assumptions & Edge Cases - Design boundaries and query limitations
- Data Lake Architecture - Architecture design including Gold layer structure
- Parquet Schema Specification - Silver layer schema this query reads from
- SQL Complete Reference - Extended SQL documentation