ADR-004: Quarantine + Condemned Layers
© 2026 Stephen Adei. All rights reserved. All content on this site is the intellectual property of Stephen Adei. See License for terms of use and attribution.
Status
Accepted
Context
FinTech systems require comprehensive audit trails for invalid data. The design must handle validation failures while maintaining compliance and preventing infinite retry loops.
The following options were considered:
- Quarantine + Condemned layers (chosen)
- Single error layer (rejected)
- Drop invalid rows (rejected - violates audit requirements)
Decision
Implement two error handling layers:
- Quarantine Layer: Rows that fail validation but may be retried (attempt_count < 3)
- Condemned Layer: Rows that exceed max retry attempts (attempt_count >= 3) or are exact duplicates
Rationale
- FinTech audit requirements: Perpetual retention of invalid data for compliance
- Retry tracking: Attempt count prevents infinite retry loops
- Human review: Quarantine enables human review before retry
- Circuit breaker: Condemned layer prevents retry storms
- Audit trail: Complete history of all validation attempts preserved
Consequences
Positive:
- Compliance: Perpetual retention satisfies FinTech audit requirements
- Safety: Circuit breaker prevents infinite retry loops
- Auditability: Complete history of validation attempts
- Human oversight: Quarantine enables review before retry
Negative:
- Operational complexity: Two additional layers to manage
- Storage cost: Invalid data retained indefinitely
- Manual review: Human intervention required for quarantine review
Alternatives Considered
Single Error Layer
- Why rejected: Doesn't distinguish between retryable and permanent failures. No circuit breaker mechanism.
Drop Invalid Rows
- Why rejected: Violates FinTech audit requirements. No audit trail for invalid data.
DynamoDB for Quarantine State
- Why rejected: Adds operational complexity. Traceability and auditability are top priority; current design uses S3 (immutable, auditable).
Related Decisions
- Design Decisions Summary - Complete trade-off analysis for this decision
- ADR-006: run_id Isolation - Run isolation enables safe retries
- ADR-003: Serverless Architecture - Error layers use S3 storage
Implementation Evidence
- Code:
validator.pyroutes rows to quarantine or condemned based on attempt_count - Documentation: Data Lake Architecture - Error Handling Layers - Layer design
- ETL Flow: ETL Flow - Error Handling - Validation and routing logic