6. New Pipeline Version
Regression Testing
Non-Functional Testing
• Same Sources & Targets
• Same Transformation Rules
• Previous fully tested version of
ETL available
7. Regression via Reference Data Schema
• Exclude
• Tracking fields
• New functionality Data
• Clean up Test Schema
• Run Smoke suite first SOURCE
TESTED
TARGET
REFERENCE
TARGET
NEW ETL
VERSION
PROD ETL
VERSION
Regression Testing
8. FitNesse for ETL Regression
• Config files
• Connections
• Tab parameters
• Fixtures
• Non-empty tab
• No duplicates
• Counts match
• Content match
Regression Testing
12. Set the Limits!
• “Partial” run & Extract re-using
• Limit compared data
• Set timeout in tests
• Model missing data
Extract Transform Load
Regression Testing: Challenges
13. Take Care about Production Support Group
or
Non-functional ETL Testing
17. Reliability Testing Challenges
Hidden Risks Underestimation of severity
Dependency on 3d party services Underestimation of probability
Communication gaps
Non-Functional Testing: Challenges
18. Be Informed!
• Monitor Services Logs
• Organize Recovery Training
• Be specific with to-do’s
Non-Functional Testing: Challenges
22. Data Warehouse Testing
Extract Transform Load
SOURCE
TARGET
Test Underlying Data
Test Data Model
Balancing Tests
Data Quality Tests
Smoke Tests
Balancing Tests
Balancing Tests
23. Test Underlying Data
1. Gather info – bridge gaps!
2. Break rules that can be broken
3. Draft a Troubleshooting doc
Source Area Testing
24. Test Target Data Model
1. Naming convention
2. Optimal base for Visualization
3. Testability checks
Data Mart Structure Testing
25. Functional ETL Testing
• Smoke Tests
• Target Data Quality tests:
• Type
• Constraint
• Data Plausibility
• Logical Constraints
! Create similar / relevant tests where applicable for Source to help with further debugging
Functional ETL Testing
26. Functional ETL Testing
• Balancing Tests:
• Study/ Create Specification
• Test Minus Queries Assertions
via mutated data
• Do both-sides comparison
Functional ETL Testing
28. Most Common bugs
• Count Mismatch (incl. Duplicates)
• Anomalies issues: Null or Length relevant
• Date relevant calculations
Functional ETL Testing
29. ETL Testing Challenges
• Tests Complexity
• Unpredictable slow work of AWS Athena
• Impossible to check each single record
Functional ETL Testing
30. Visualization in Data QA
• Source Data Analysis
• Target Quality
Dashboard
• Dedicated resources
& Test Results
visualization
Functional ETL Testing
31. Ongoing Support
• Data Integrity Project
• Ongoing Logs Analysis
• Monitoring Rules &
Alarms
Testing in Production
Data Pipeline
32. Key Takeaways
• ETL verification is not that bad
• Know your data
• Be ready to meet Monsters
• Long ETL duration
• Big Data Volume
• Difference of Test Data from Prod