Skip to main content

Efficient Test Runner Guide

Context: The commands below are for Task 1 (ETL) when run from tasks/data_ingestion_transformation/. From repo root use make test-task1 or make test and see TESTING_MANUAL.md for all root test commands and report locations.

Quick Start - Run All Tests with Metrics

The easiest and most efficient way to run all tests with metrics:

make test-with-metrics

This single command will:

  1. ✅ Run all tests in Docker (Pandas + PySpark)
  2. ✅ Automatically collect metrics (memory, CPU, time, Spark)
  3. ✅ Generate comprehensive metrics reports
  4. ✅ Display summary of results

Alternative Methods

# Run all tests with metrics (one command)
make test-with-metrics

# Or run tests and metrics separately
make test # Run tests only
make test-metrics # Generate metrics from existing report

# Run specific test suites with metrics
make test-pandas && make test-metrics
make test-spark && make test-metrics

Method 2: Using the Test Script

# Run all tests with metrics
./scripts/run_tests_with_metrics.sh

# Run specific test file
./scripts/run_tests_with_metrics.sh tests/test_etl.py

# Run with additional pytest options
./scripts/run_tests_with_metrics.sh tests/test_etl.py -k "test_validation"

Method 3: Using Docker Compose Directly

# Build and run tests
docker-compose -f docker-compose.test.yml build
docker-compose -f docker-compose.test.yml run --rm etl-tests pytest tests/ -v

# Then generate metrics
docker-compose -f docker-compose.test.yml run --rm etl-tests python scripts/generate_test_metrics.py

Available Test Commands

Standard Test Commands

make test              # All tests (Pandas + PySpark)
make test-pandas # Pandas tests only
make test-spark # PySpark tests only
make test-unit # Unit tests only
make test-integration # Integration tests only
make test-load # Load tests (fast, excludes slow)
make test-load-full # All load tests (includes slow)
make test-edge-cases # Edge case tests

Metrics Commands

make test-with-metrics # Run tests + generate metrics (RECOMMENDED)
make test-metrics # Generate metrics from existing report

Utility Commands

make build            # Build Docker test image
make clean # Clean Docker resources
make archive-reports # Archive test reports
make help # Show all available commands

What Metrics Are Collected?

Every test automatically collects:

System Metrics

  • Time: Duration, CPU time, start/end timestamps
  • Memory: RSS, peak memory, memory delta, VMS, shared memory
  • CPU: CPU time, CPU percentage, user/system time
  • System: Load average, thread count, open file descriptors

Spark Metrics (for PySpark tests)

  • Executor memory usage
  • Job metrics (tasks, completed, failed)
  • Stage metrics
  • Active jobs count

S3 Metrics (when using instrumented_s3_client fixture)

  • Read/write/delete/list operations
  • Bytes transferred
  • Operation latency (avg, min, max, p50, p95, p99)
  • Retry counts and errors

Viewing Metrics Reports

After running tests, metrics are available in:

  1. reports/test_metrics.json - Machine-readable metrics
  2. reports/TEST_METRICS.md - Human-readable summary with resource usage
  3. reports/test_report.json - Full pytest JSON report with per-test metrics
  4. reports/test_report.html - Visual HTML report

Quick View

# View markdown summary
cat reports/TEST_METRICS.md

# View JSON metrics (requires jq)
cat reports/test_metrics.json | jq

# Open HTML report (Linux)
xdg-open reports/test_report.html

Performance Tips

1. Skip Slow Tests During Development

# Run tests excluding slow markers
docker-compose -f docker-compose.test.yml run --rm etl-tests pytest tests/ -v -m "not slow"

2. Run Tests in Parallel (if supported)

# Install pytest-xdist first, then:
docker-compose -f docker-compose.test.yml run --rm etl-tests pytest tests/ -n auto

3. Run Specific Test Files

# Run only one test file
make test-pandas # Only Pandas tests
make test-spark # Only PySpark tests

# Or use the script
./scripts/run_tests_with_metrics.sh tests/test_etl.py

4. Quick Test Without Rebuilding

# If Docker image already exists
make test-quick

Troubleshooting

Metrics Not Appearing

  1. Check if psutil is installed:

    docker-compose -f docker-compose.test.yml run --rm etl-tests pip list | grep psutil
  2. Verify test_report.json exists:

    ls -la reports/test_report.json
  3. Regenerate metrics manually:

    make test-metrics

Docker Issues

# Rebuild Docker image
make build

# Clean and rebuild
make clean
make build

Missing Dependencies

# Rebuild Docker image to install new dependencies
make build

Example Workflow

# 1. Run all tests with metrics
make test-with-metrics

# 2. Check results
cat reports/TEST_METRICS.md

# 3. If tests pass, archive reports
make archive-reports

# 4. View detailed HTML report
xdg-open reports/test_report.html

Integration with CI/CD

For CI/CD pipelines, use:

# Run tests and exit with error code on failure
make test-with-metrics || exit 1

The test-with-metrics command will:

  • Exit with code 0 if all tests pass
  • Exit with code 1 if any tests fail
  • Generate metrics regardless of test outcome

Next Steps

  • See Test Metrics Guide for detailed metrics documentation (excluded from documentation)
  • See Testing Quick Start for more testing options (excluded from documentation)
  • See Test Documentation for test structure details (excluded from documentation)

Technical Documentation

Task-Specific Documentation

  • Task 1 Test Directory (source only) - Detailed test documentation
  • Task 1 Test Reports (source only) - Test reporting overview
© 2026 Stephen AdeiCC BY 4.0