Skip to main content

Testing & Validation Guide

© 2026 Stephen Adei. All rights reserved. All content on this site is the intellectual property of Stephen Adei. See License for terms of use and attribution.

This document describes the comprehensive test suite for validating both Python and SQL code in the Ohpen case study implementation.

Overview

Python Tests (Task 1 - ETL)

Python Test Location

  • tasks/data_ingestion_transformation/tests/

Python Test Files

  1. test_etl.py - Unit tests for validation logic

    • Valid data handling
    • Invalid currency detection
    • Null value handling
    • Malformed timestamp detection
    • Partition column generation
    • Run ID in quarantine
    • Empty DataFrame edge case
    • All rows quarantined edge case
    • Deterministic behavior
  2. test_integration.py - End-to-end integration tests

    • Full ETL workflow with temp files
    • Parquet partition structure validation
    • Success marker validation
    • Quarantine data validation
    • Deterministic output verification

Test status (Task 1 ETL)

All Task 1 tests pass except one, which is environment-specific:

  • test_s3_large_file_handling — Validates Parquet compression effectiveness for large files (100K+ rows). In MinIO / Docker CI the written size can exceed the test’s bound, so the test fails there; it is intended for (and would pass in) a real AWS S3 environment. This single failure in CI is expected and does not indicate a product defect. See Task 1 tests README: tasks/data_ingestion_transformation/tests/README_TESTING.md.

Running Python Tests

cd /home/stephen/projects/ohpen-case-2026/tasks/data_ingestion_transformation

# Install dependencies

pip install -r requirements.txt
pip install -r requirements-dev.txt

# Run linting

ruff check src/ tests/

# Run all tests

export PYTHONPATH=$PYTHONPATH:$(pwd)
pytest tests/ -v

# Run specific test file

pytest tests/test_etl.py -v
pytest tests/test_integration.py -v

Script Tests (Workspace & Task Scripts)

Script Test Location

  • Root make test-scripts runs only: tasks/communication_documentation/tests/ (Task 5 deliverable/compile tests).
  • Optional: tests/scripts/ at repo root contains tests for workspace-level scripts (e.g. submission bundles); run manually with pytest tests/scripts/ -v if needed.

Script Test Files

  1. tests/scripts/test_create_submission_bundles.py - Tests for submission bundle creation script

    • Script existence and permissions
    • Bash syntax validation
    • Directory structure creation
    • File copying logic
    • ZIP file generation
    • Error handling
  2. tasks/communication_documentation/tests/test_compile_deliverables.py - Tests for PDF compilation script

    • Script syntax validation
    • File reference checks
    • Pandoc integration
    • Error handling

Running Script Tests

# From project root
cd /home/stephen/projects/ohpen-case-2026

# Run script tests (Task 5; same as make test-scripts)
pytest tasks/communication_documentation/tests/ -v

# Or use Makefile from repo root
make test-scripts

# Run specific test
pytest tests/scripts/test_create_submission_bundles.py -v

SQL Tests (Task 3 - Balance Query)

SQL Test Location

  • tasks/sql/tests/

SQL Test Files

  1. test_balance_query.py - DuckDB-based SQL tests

    • SQL syntax validation
    • Expected results verification
    • Missing months handling (NULL values)
    • Last transaction of month selection
  2. test_data.sql - Sample data from Appendix A

  3. expected_output.csv - Expected query results

Running SQL Tests

cd /home/stephen/projects/ohpen-case-2026/tasks/sql

# Install dependencies (2)

pip install -r requirements.txt

# Run SQL linting

sqlfluff lint balance_history_2024_q1.sql --dialect postgres

# Run SQL unit tests

pytest tests/ -v

CI/CD Integration

The GitHub Actions workflow (.github/workflows/ci.yml) automatically runs all tests:

Jobs

  1. python-validation

    • Ruff linting on source and test files
    • All pytest unit and integration tests
  2. sql-validation

    • SQLFluff linting on SQL queries
    • DuckDB-based SQL unit tests

Triggering CI

git add .
git commit -m "Add comprehensive test suite"
git push origin master

Test Coverage

Python ETL Coverage

  • Validation logic (nulls, currency, timestamp)
  • Partition generation (year/month extraction)
  • Quarantine mechanism
  • Success marker creation
  • Edge cases (empty, all invalid)
  • Deterministic behavior (backfill safety)
  • End-to-end workflow
  • Metadata enrichment (row_hash, attempt_count)
  • Loop prevention (duplicate detection, attempt limits)
  • Circuit breaker logic
  • Condemned layer handling

Script Coverage

  • Bash script syntax validation
  • Script existence and permissions
  • Directory creation logic
  • File copying operations
  • Error handling
  • Output validation

SQL Query Coverage

  • Syntax validation (PostgreSQL/ANSI dialect)
  • Expected output verification
  • NULL handling for missing months
  • Last transaction selection logic
  • Cross join (account × month spine)

Testing Strategy Decisions

Why DuckDB instead of real PostgreSQL/Aurora

  • Speed: DuckDB runs in-memory, tests complete in milliseconds
  • Portability: No external dependencies, works in any CI environment
  • Sufficient: Validates SQL syntax and logic (the case study requirement)
  • Pragmatic: Avoids over-engineering for a case study submission

Why Temp Files instead of Testcontainers for Python

For this case study, the test suite uses temporary directories and mocking instead of Testcontainers (Docker-in-Docker with real S3).

Rationale

  • Fast: Tests run in <2 seconds vs 10+ seconds with container startup
  • Simple: Fewer dependencies, easier to debug
  • Validates core logic: Partition generation, validation, quarantine handling
  • Interview scope: Implements testing best practices without over-engineering

When to use Testcontainers (production context)

  • Testing S3-specific features (multipart uploads, versioning, lifecycle policies)
  • Integration tests requiring exact AWS SDK behavior
  • End-to-end workflows with Glue, Lambda, Step Functions
  • Testing against multiple database engines simultaneously

Trade-off acknowledged: The tests validate transformation logic but not S3 API edge cases. For production deployment at Ohpen, Testcontainers-based integration tests would be added as a second layer.

Benefits for Interview

This test suite includes:

  1. Production Readiness: Comprehensive validation before deployment
  2. Data Quality: Automated checks prevent data corruption
  3. Backfill Safety: Determinism tests ensure reproducible outputs
  4. Best Practices: Linting, unit tests, integration tests, CI/CD
  5. Documentation: Tests serve as executable examples
  6. Pragmatic Engineering: Chose the right tool for the scope (DuckDB, temp files) while knowing when to upgrade (Testcontainers for production)

Quick Validation

Run all tests from repository root:

cd /home/stephen/projects/ohpen-case-2026

# Run all task tests (Task 1 + 3 + 4) in Docker, then combined report
make test

# Or run specific task tests
make test-task1 # Task 1 (ETL) only
make test-task3 # Task 3 (SQL) only
make test-task4 # Task 4 (CI/CD) only
make test-scripts # Task 5 script tests only (runs locally, not in Docker)

For the full list of commands and report paths, see TESTING_MANUAL.md.

Option 2: Local Environment

If you prefer to run tests locally:

cd /home/stephen/projects/ohpen-case-2026

# Python tests

cd tasks/data_ingestion_transformation
pip install -r requirements.txt -r requirements-dev.txt
ruff check src/ tests/
pytest tests/ -v

# SQL tests

cd ../sql
pip install -r requirements.txt
sqlfluff lint balance_history_2024_q1.sql --dialect postgres
pytest tests/ -v

# Script tests

cd ../..
pip install pytest
pytest tests/scripts/ tasks/communication_documentation/tests/test_compile_deliverables.py -v

Expected output: All tests passing

See also

© 2026 Stephen AdeiCC BY 4.0