Dockerized Testing Guide
Context: This guide describes the Task 1 (ETL) Docker test setup. From repo root use
make test-task1ormake test; see TESTING_MANUAL.md for root commands and combined report locations.
Overview
All tests are now fully dockerized and can run in isolated containers with all dependencies (MinIO, Spark, Java) pre-configured.
Quick Start
Run All Tests
make test-docker
Run Individual Test Files
# Using the test runner script
./scripts/run_test_solo.sh tests/test_s3_integration.py -v
# Using Make
make test-solo FILE=tests/test_s3_integration.py
Run Specific Test Suites
make test-s3-integration # S3 integration tests
make test-spark-integration # Spark integration tests
make test-idempotency # Idempotency tests
make test-failure-modes # Failure mode tests
make test-performance # Performance tests
make test-resilience # Resilience tests
make test-scenarios # Bronze→Silver→Promotion scenario tests (MinIO)
make test-contract # StoragePort + IngestStrategy contract tests
make test-invariants # Row-conservation and no-duplicate-ID invariants (MinIO)
make test-golden # Golden-file comparison for a1/a2 (MinIO)
No venv required: All of the above run inside the etl-tests container; use Docker as the primary way to run tests.
Docker Architecture
Services
-
MinIO (
minio)- S3-compatible storage for integration tests
- Automatically initialized with test buckets
- Health checks ensure readiness
-
MinIO Init (
minio-init)- One-time bucket initialization
- Creates: test-bronze, test-silver, test-quarantine, test-condemned
-
ETL Test Runner (
etl-tests)- Python 3.11 with all dependencies
- Java 17 for PySpark
- All test dependencies pre-installed
- Connected to MinIO via network
Network Configuration
All services run on test-network bridge network:
minio:9000- MinIO API endpoint- Services can communicate via service names
Environment Variables
The test runner automatically sets:
S3_ENDPOINT_URL=http://minio:9000
AWS_ACCESS_KEY_ID=minioadmin
AWS_SECRET_ACCESS_KEY=minioadmin
AWS_DEFAULT_REGION=us-east-1
TEST_BRONZE_BUCKET=test-bronze
TEST_SILVER_BUCKET=test-silver
TEST_QUARANTINE_BUCKET=test-quarantine
TEST_CONDEMNED_BUCKET=test-condemned
JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
PYTHONPATH=/app
Test File Organization
New Test Files (All Dockerized)
-
test_spark_integration.py- Real Spark execution tests- Requires: Java 17, PySpark
- Marker:
@pytest.mark.real_spark
-
test_idempotency.py- Deduplication verification- Requires: MinIO
- Marker:
@pytest.mark.integration
-
test_failure_modes.py- Failure scenario tests- Requires: MinIO
- Marker:
@pytest.mark.integration
-
test_performance_metrics.py- Performance benchmarks- Requires: None (local execution)
- Marker:
@pytest.mark.performance
-
test_resilience.py- Recovery and resilience tests- Requires: MinIO
- Marker:
@pytest.mark.integration
-
test_s3_integration.py- Enhanced S3 integration tests- Requires: MinIO
- Marker:
@pytest.mark.real_s3,@pytest.mark.integration
Enhanced Existing Files
test_performance.py- Enhanced with 6 new teststest_data_quality.py- Enhanced with 6 new tests
Running Tests Individually
Method 1: Using Test Runner Script
# Run a single test file
./scripts/run_test_solo.sh tests/test_s3_integration.py -v
# Run with specific pytest markers
./scripts/run_test_solo.sh tests/test_idempotency.py -v -m "integration"
# Run specific test function
./scripts/run_test_solo.sh tests/test_s3_integration.py::test_s3_partition_structure_validation -v
Method 2: Using Docker Compose Directly
# Start services
docker-compose -f docker-compose.test.yml up -d minio minio-init
# Run specific test file
docker-compose -f docker-compose.test.yml run --rm etl-tests \
pytest tests/test_s3_integration.py -v
# Run with markers
docker-compose -f docker-compose.test.yml run --rm etl-tests \
pytest tests/ -v -m "integration"
Method 3: Using Make
# Run specific test suite
make test-s3-integration
# Run individual test file
make test-solo FILE=tests/test_idempotency.py ARGS="-v -k test_reprocess"
Test Execution Flow
-
Service Startup
- MinIO starts and waits for health check
- MinIO-init creates buckets
- Test runner waits for dependencies
-
Test Execution
- Tests run in isolated container
- All environment variables set automatically
- Network connectivity to MinIO via service name
-
Cleanup
- Container removed after test (
--rmflag) - MinIO data persists in volume
- Reports saved to
./reports/
- Container removed after test (
Troubleshooting
MinIO Port Already in Use
If you see "port is already allocated", either:
- Use existing MinIO container (script detects automatically)
- Stop existing MinIO:
docker stop ohpen-etl-test-minio - Use different port in
docker-compose.test.yml
Tests Cannot Connect to MinIO
- Check MinIO is running:
docker ps | grep minio - Check network:
docker network inspect test-network - Verify environment variables in test container
Java/PySpark Issues
- Verify Java is installed:
docker-compose -f docker-compose.test.yml run --rm etl-tests java -version - Check JAVA_HOME:
docker-compose -f docker-compose.test.yml run --rm etl-tests echo $JAVA_HOME
Build Cache Issues
# Rebuild without cache
docker-compose -f docker-compose.test.yml build --no-cache etl-tests
CI/CD Integration
The same Docker setup is used in CI/CD:
# .github/workflows/ci.yml
integration-tests:
services:
minio:
image: minio/minio:latest
# ... same configuration
Best Practices
- Always use Docker for integration tests - Ensures consistent environment
- Run tests individually first - Catch issues early
- Use markers for test selection -
-m "integration"for integration tests - Check service health - Scripts wait for MinIO readiness
- Clean up between runs - Use
--rmflag ormake clean
File Structure
tasks/data_ingestion_transformation/
├── Dockerfile.test # Test container definition
├── docker-compose.test.yml # Test services orchestration
├── scripts/
│ └── run_test_solo.sh # Individual test runner
├── tests/
│ ├── test_s3_integration.py # S3 tests (dockerized)
│ ├── test_spark_integration.py # Spark tests (dockerized)
│ ├── test_idempotency.py # Idempotency tests (dockerized)
│ ├── test_failure_modes.py # Failure tests (dockerized)
│ ├── test_resilience.py # Resilience tests (dockerized)
│ └── ...
└── Makefile # Docker test commands
Summary
✅ All tests are dockerized
✅ MinIO automatically configured
✅ Environment variables set automatically
✅ Network isolation via Docker networks
✅ Easy individual test execution
✅ CI/CD ready
Related Documentation
Technical Documentation
- Testing Quick Start - Quick start guide for running tests
- Testing Guide - Comprehensive testing documentation
- Test Runner Guide - Efficient test execution
- Test Metrics Guide - Understanding test metrics
- Unified Testing Convention - Testing standards
- Test Results Overview - Current test execution results
Task-Specific Documentation
- Task 1 Test Directory (source only) - Detailed test documentation