Efficient Test Runner Guide

Context: The commands below are for Task 1 (ETL) when run from tasks/data_ingestion_transformation/. From repo root use make test-task1 or make test and see TESTING_MANUAL.md for all root test commands and report locations.

Quick Start - Run All Tests with Metrics

The easiest and most efficient way to run all tests with metrics:

make test-with-metrics

This single command will:

✅ Run all tests in Docker (Pandas + PySpark)
✅ Automatically collect metrics (memory, CPU, time, Spark)
✅ Generate comprehensive metrics reports
✅ Display summary of results

Alternative Methods

Method 1: Using Make (Recommended)

# Run all tests with metrics (one command)
make test-with-metrics

# Or run tests and metrics separately
make test              # Run tests only
make test-metrics      # Generate metrics from existing report

# Run specific test suites with metrics
make test-pandas && make test-metrics
make test-spark && make test-metrics

Method 2: Using the Test Script

# Run all tests with metrics
./scripts/run_tests_with_metrics.sh

# Run specific test file
./scripts/run_tests_with_metrics.sh tests/test_etl.py

# Run with additional pytest options
./scripts/run_tests_with_metrics.sh tests/test_etl.py -k "test_validation"

Method 3: Using Docker Compose Directly

# Build and run tests
docker-compose -f docker-compose.test.yml build
docker-compose -f docker-compose.test.yml run --rm etl-tests pytest tests/ -v

# Then generate metrics
docker-compose -f docker-compose.test.yml run --rm etl-tests python scripts/generate_test_metrics.py

Available Test Commands

Standard Test Commands

make test              # All tests (Pandas + PySpark)
make test-pandas       # Pandas tests only
make test-spark        # PySpark tests only
make test-unit         # Unit tests only
make test-integration  # Integration tests only
make test-load         # Load tests (fast, excludes slow)
make test-load-full    # All load tests (includes slow)
make test-edge-cases   # Edge case tests

Metrics Commands

make test-with-metrics # Run tests + generate metrics (RECOMMENDED)
make test-metrics      # Generate metrics from existing report

Utility Commands

make build            # Build Docker test image
make clean            # Clean Docker resources
make archive-reports  # Archive test reports
make help             # Show all available commands

What Metrics Are Collected?

Every test automatically collects:

System Metrics

Time: Duration, CPU time, start/end timestamps
Memory: RSS, peak memory, memory delta, VMS, shared memory
CPU: CPU time, CPU percentage, user/system time
System: Load average, thread count, open file descriptors

Spark Metrics (for PySpark tests)

Executor memory usage
Job metrics (tasks, completed, failed)
Stage metrics
Active jobs count

S3 Metrics (when using `instrumented_s3_client` fixture)

Read/write/delete/list operations
Bytes transferred
Operation latency (avg, min, max, p50, p95, p99)
Retry counts and errors

Viewing Metrics Reports

After running tests, metrics are available in:

reports/test_metrics.json - Machine-readable metrics
reports/TEST_METRICS.md - Human-readable summary with resource usage
reports/test_report.json - Full pytest JSON report with per-test metrics
reports/test_report.html - Visual HTML report

Quick View

# View markdown summary
cat reports/TEST_METRICS.md

# View JSON metrics (requires jq)
cat reports/test_metrics.json | jq

# Open HTML report (Linux)
xdg-open reports/test_report.html

Performance Tips

1. Skip Slow Tests During Development

# Run tests excluding slow markers
docker-compose -f docker-compose.test.yml run --rm etl-tests pytest tests/ -v -m "not slow"

2. Run Tests in Parallel (if supported)

# Install pytest-xdist first, then:
docker-compose -f docker-compose.test.yml run --rm etl-tests pytest tests/ -n auto

3. Run Specific Test Files

# Run only one test file
make test-pandas  # Only Pandas tests
make test-spark   # Only PySpark tests

# Or use the script
./scripts/run_tests_with_metrics.sh tests/test_etl.py

4. Quick Test Without Rebuilding

# If Docker image already exists
make test-quick

Troubleshooting

Metrics Not Appearing

Check if psutil is installed:

docker-compose -f docker-compose.test.yml run --rm etl-tests pip list | grep psutil

Verify test_report.json exists:
```
ls -la reports/test_report.json
```
Regenerate metrics manually:
```
make test-metrics
```

Docker Issues

# Rebuild Docker image
make build

# Clean and rebuild
make clean
make build

Missing Dependencies

# Rebuild Docker image to install new dependencies
make build

Example Workflow

# 1. Run all tests with metrics
make test-with-metrics

# 2. Check results
cat reports/TEST_METRICS.md

# 3. If tests pass, archive reports
make archive-reports

# 4. View detailed HTML report
xdg-open reports/test_report.html

Integration with CI/CD

For CI/CD pipelines, use:

# Run tests and exit with error code on failure
make test-with-metrics || exit 1

The test-with-metrics command will:

Exit with code 0 if all tests pass
Exit with code 1 if any tests fail
Generate metrics regardless of test outcome

Next Steps

See Test Metrics Guide for detailed metrics documentation (excluded from documentation)
See Testing Quick Start for more testing options (excluded from documentation)
See Test Documentation for test structure details (excluded from documentation)

Technical Documentation

Testing Quick Start - Quick start guide for running tests
Testing Guide - Comprehensive testing documentation
Test Metrics Guide - Understanding test metrics
Docker Testing - Dockerized test environment
Unified Testing Convention - Testing standards
Test Results Overview - Current test execution results

Task-Specific Documentation

Task 1 Test Directory (source only) - Detailed test documentation
Task 1 Test Reports (source only) - Test reporting overview

Quick Start - Run All Tests with Metrics​

Alternative Methods​

Method 1: Using Make (Recommended)​

Method 2: Using the Test Script​

Method 3: Using Docker Compose Directly​

Available Test Commands​

Standard Test Commands​

Metrics Commands​

Utility Commands​

What Metrics Are Collected?​

System Metrics​

Spark Metrics (for PySpark tests)​

S3 Metrics (when using instrumented_s3_client fixture)​

Viewing Metrics Reports​

Quick View​

Performance Tips​

1. Skip Slow Tests During Development​

2. Run Tests in Parallel (if supported)​

3. Run Specific Test Files​

4. Quick Test Without Rebuilding​

Troubleshooting​

Metrics Not Appearing​

Docker Issues​

Missing Dependencies​

Example Workflow​

Integration with CI/CD​

Next Steps​

Related Documentation​

Technical Documentation​

Task-Specific Documentation​