SlideShare a Scribd company logo
1 of 33
Download to read offline
Pipeline Testing Story
IRINA PASHKOVA
QA Lead, GreenM
Agenda
1. Regression ETL testing
2. Non-functional ETL testing
3. Functional ETL Testing
Puppy to Play with
Daily Runs
Full Refresh Mode
300 Customers
~ 500 Mln rec / tab
~ 5h ETL time
Better & Faster – ETL Evolution
or
Regression ETL Testing
ETL
Extract Transform Load
Operations Storages
or
DATA SOURCES
Reporting Oriented
Data Marts or
TARGETS
New Pipeline Version
Regression Testing
Non-Functional Testing
• Same Sources & Targets
• Same Transformation Rules
• Previous fully tested version of
ETL available
Regression via Reference Data Schema
• Exclude
• Tracking fields
• New functionality Data
• Clean up Test Schema
• Run Smoke suite first SOURCE
TESTED
TARGET
REFERENCE
TARGET
NEW ETL
VERSION
PROD ETL
VERSION
Regression Testing
FitNesse for ETL Regression
• Config files
• Connections
• Tab parameters
• Fixtures
• Non-empty tab
• No duplicates
• Counts match
• Content match
Regression Testing
FitNesse for ETL Regression
Regression Testing
Regression Challenges
Long run time of ETL
Big Data volume
Regression Testing
Time waste waiting
for a fix / change
Hang up tests
Manual Inspections
• Configurations:
• Connections
• Run mode
• Pipeline Steps order & dependencies
• Source & Target Tabs
• ETL code queries
Regression Testing: Challenges
Set the Limits!
• “Partial” run & Extract re-using
• Limit compared data
• Set timeout in tests
• Model missing data
Extract Transform Load
Regression Testing: Challenges
Take Care about Production Support Group
or
Non-functional ETL Testing
Non-functional Pipeline Testing
• Performance
• Security
• Load/ Stress
• Scalability
• Usability
• Reliability
Non-Functional Testing
Usability Testing
• Easy to
• identify current state
• find/read Error info
• re-configure
• Flexible Start
• Documentation
Non-Functional Testing
• Risks assessment
• Failure simulation
• Volume simulation
Reliability Testing
Non-Functional Testing
Reliability Testing Challenges
Hidden Risks Underestimation of severity
Dependency on 3d party services Underestimation of probability
Communication gaps
Non-Functional Testing: Challenges
Be Informed!
• Monitor Services Logs
• Organize Recovery Training
• Be specific with to-do’s
Non-Functional Testing: Challenges
We’re done! Aren’t we?
Add Analytics for
a New Business Module…
please
New Data Module Creation
or
Functional ETL Testing
Data Warehouse Testing
Extract Transform Load
SOURCE
TARGET
Test Underlying Data
Test Data Model
Balancing Tests
Data Quality Tests
Smoke Tests
Balancing Tests
Balancing Tests
Test Underlying Data
1. Gather info – bridge gaps!
2. Break rules that can be broken
3. Draft a Troubleshooting doc
Source Area Testing
Test Target Data Model
1. Naming convention
2. Optimal base for Visualization
3. Testability checks
Data Mart Structure Testing
Functional ETL Testing
• Smoke Tests
• Target Data Quality tests:
• Type
• Constraint
• Data Plausibility
• Logical Constraints
! Create similar / relevant tests where applicable for Source to help with further debugging
Functional ETL Testing
Functional ETL Testing
• Balancing Tests:
• Study/ Create Specification
• Test Minus Queries Assertions
via mutated data
• Do both-sides comparison
Functional ETL Testing
Balancing Tests
• One all-data storage
• AWS Glue & Athena
Functional ETL Testing
Most Common bugs
• Count Mismatch (incl. Duplicates)
• Anomalies issues: Null or Length relevant
• Date relevant calculations
Functional ETL Testing
ETL Testing Challenges
• Tests Complexity
• Unpredictable slow work of AWS Athena
• Impossible to check each single record
Functional ETL Testing
Visualization in Data QA
• Source Data Analysis
• Target Quality
Dashboard
• Dedicated resources
& Test Results
visualization
Functional ETL Testing
Ongoing Support
• Data Integrity Project
• Ongoing Logs Analysis
• Monitoring Rules &
Alarms
Testing in Production
Data Pipeline
Key Takeaways
• ETL verification is not that bad
• Know your data
• Be ready to meet Monsters
• Long ETL duration
• Big Data Volume
• Difference of Test Data from Prod
Your questions

More Related Content

What's hot

Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering Madhar Khan Pathan
 
MySQL optimisations of Docplanner services
MySQL optimisations of Docplanner servicesMySQL optimisations of Docplanner services
MySQL optimisations of Docplanner servicesTomasz Wójcik
 
Testing, a pragmatic approach
Testing, a pragmatic approachTesting, a pragmatic approach
Testing, a pragmatic approachEnrico Da Ros
 
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and ChallengesBIOVIA
 
Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering Madhar Khan Pathan
 
Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Brij Mishra
 
Annotation Sniffer Hotspots implementation
Annotation Sniffer Hotspots implementationAnnotation Sniffer Hotspots implementation
Annotation Sniffer Hotspots implementationHélio Costa e Silva
 
Importing Queries using Mass Import Tool
Importing Queries using Mass Import ToolImporting Queries using Mass Import Tool
Importing Queries using Mass Import ToolDatagaps Inc
 
Hands on training on DbFit Part-I
Hands on training on DbFit Part-IHands on training on DbFit Part-I
Hands on training on DbFit Part-IBabul Mirdha
 
Crafting high quality code
Crafting high quality code Crafting high quality code
Crafting high quality code Allan Mangune
 
DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanMadhu Nepal
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognosSandeep Mehta
 
Object-oriented Analysis, Design & Programming
Object-oriented Analysis, Design & ProgrammingObject-oriented Analysis, Design & Programming
Object-oriented Analysis, Design & ProgrammingAllan Mangune
 
IRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entityIRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entitykartik179
 
Software design with Domain-driven design
Software design with Domain-driven design Software design with Domain-driven design
Software design with Domain-driven design Allan Mangune
 
Test strategy utilising mc useful tools
Test strategy utilising mc useful toolsTest strategy utilising mc useful tools
Test strategy utilising mc useful toolsMark Chappell
 

What's hot (18)

Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering
 
MySQL optimisations of Docplanner services
MySQL optimisations of Docplanner servicesMySQL optimisations of Docplanner services
MySQL optimisations of Docplanner services
 
Testing, a pragmatic approach
Testing, a pragmatic approachTesting, a pragmatic approach
Testing, a pragmatic approach
 
Agile Tools
Agile ToolsAgile Tools
Agile Tools
 
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
 
Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering
 
Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012
 
Annotation Sniffer Hotspots implementation
Annotation Sniffer Hotspots implementationAnnotation Sniffer Hotspots implementation
Annotation Sniffer Hotspots implementation
 
Importing Queries using Mass Import Tool
Importing Queries using Mass Import ToolImporting Queries using Mass Import Tool
Importing Queries using Mass Import Tool
 
Hands on training on DbFit Part-I
Hands on training on DbFit Part-IHands on training on DbFit Part-I
Hands on training on DbFit Part-I
 
Crafting high quality code
Crafting high quality code Crafting high quality code
Crafting high quality code
 
DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing Plan
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognos
 
Object-oriented Analysis, Design & Programming
Object-oriented Analysis, Design & ProgrammingObject-oriented Analysis, Design & Programming
Object-oriented Analysis, Design & Programming
 
IRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entityIRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entity
 
ETL Testing Overview
ETL Testing OverviewETL Testing Overview
ETL Testing Overview
 
Software design with Domain-driven design
Software design with Domain-driven design Software design with Domain-driven design
Software design with Domain-driven design
 
Test strategy utilising mc useful tools
Test strategy utilising mc useful toolsTest strategy utilising mc useful tools
Test strategy utilising mc useful tools
 

Similar to Data Pipeline Installation Quality

ETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your DataETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your DataBugRaptors
 
GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)GOKb Project
 
Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014Red Gate Software
 
Data engineering testing services
Data engineering testing servicesData engineering testing services
Data engineering testing servicesNitor Infotech
 
Test Design and Automation for REST API
Test Design and Automation for REST APITest Design and Automation for REST API
Test Design and Automation for REST APIIvan Katunou
 
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...Apica
 
API-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptxAPI-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptxamarnathdeo
 
Tuning ETL's for Better BI
Tuning ETL's for Better BITuning ETL's for Better BI
Tuning ETL's for Better BIDatavail
 
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.COMAQA.BY
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptxJesusaEspeleta
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyRTTS
 
Introduction to SoapUI day 2
Introduction to SoapUI day 2Introduction to SoapUI day 2
Introduction to SoapUI day 2Qualitest
 
SFDC Introduction to Apex
SFDC Introduction to ApexSFDC Introduction to Apex
SFDC Introduction to ApexSujit Kumar
 
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Databricks
 
Understanding System Performance
Understanding System PerformanceUnderstanding System Performance
Understanding System PerformanceTeradata
 
Load Testing Best Practices
Load Testing Best PracticesLoad Testing Best Practices
Load Testing Best PracticesApica
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing processRakesh Hansalia
 

Similar to Data Pipeline Installation Quality (20)

ETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your DataETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your Data
 
GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)
 
Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014
 
Data engineering testing services
Data engineering testing servicesData engineering testing services
Data engineering testing services
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Test Design and Automation for REST API
Test Design and Automation for REST APITest Design and Automation for REST API
Test Design and Automation for REST API
 
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
July webinar l How to Handle the Holiday Retail Rush with Agile Performance T...
 
API-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptxAPI-Testing-SOAPUI-1.pptx
API-Testing-SOAPUI-1.pptx
 
Tuning ETL's for Better BI
Tuning ETL's for Better BITuning ETL's for Better BI
Tuning ETL's for Better BI
 
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
Ivan Katunov. Comaqa Spring 2018. Test Design and Automation for Rest API.
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Introduction to SoapUI day 2
Introduction to SoapUI day 2Introduction to SoapUI day 2
Introduction to SoapUI day 2
 
SFDC Introduction to Apex
SFDC Introduction to ApexSFDC Introduction to Apex
SFDC Introduction to Apex
 
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
 
Etl testing
Etl testingEtl testing
Etl testing
 
Understanding System Performance
Understanding System PerformanceUnderstanding System Performance
Understanding System Performance
 
Load Testing Best Practices
Load Testing Best PracticesLoad Testing Best Practices
Load Testing Best Practices
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing process
 

More from GreenM

User Case of Migration from MicroStrategy to Power BI
 User Case of Migration from MicroStrategy to Power BI User Case of Migration from MicroStrategy to Power BI
User Case of Migration from MicroStrategy to Power BIGreenM
 
Tableau vs Microstrategy
Tableau vs MicrostrategyTableau vs Microstrategy
Tableau vs MicrostrategyGreenM
 
Data monsters probablistic data structures
Data monsters probablistic data structuresData monsters probablistic data structures
Data monsters probablistic data structuresGreenM
 
Data streamsnorkelingdatamonsters
Data streamsnorkelingdatamonstersData streamsnorkelingdatamonsters
Data streamsnorkelingdatamonstersGreenM
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl newGreenM
 
DAX as Power BI Visualization Weapon
DAX as Power BI Visualization WeaponDAX as Power BI Visualization Weapon
DAX as Power BI Visualization WeaponGreenM
 
How To Make Your Dashboard Smaller
How To Make Your Dashboard SmallerHow To Make Your Dashboard Smaller
How To Make Your Dashboard SmallerGreenM
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipelineGreenM
 

More from GreenM (8)

User Case of Migration from MicroStrategy to Power BI
 User Case of Migration from MicroStrategy to Power BI User Case of Migration from MicroStrategy to Power BI
User Case of Migration from MicroStrategy to Power BI
 
Tableau vs Microstrategy
Tableau vs MicrostrategyTableau vs Microstrategy
Tableau vs Microstrategy
 
Data monsters probablistic data structures
Data monsters probablistic data structuresData monsters probablistic data structures
Data monsters probablistic data structures
 
Data streamsnorkelingdatamonsters
Data streamsnorkelingdatamonstersData streamsnorkelingdatamonsters
Data streamsnorkelingdatamonsters
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl new
 
DAX as Power BI Visualization Weapon
DAX as Power BI Visualization WeaponDAX as Power BI Visualization Weapon
DAX as Power BI Visualization Weapon
 
How To Make Your Dashboard Smaller
How To Make Your Dashboard SmallerHow To Make Your Dashboard Smaller
How To Make Your Dashboard Smaller
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipeline
 

Recently uploaded

Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 

Recently uploaded (20)

Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 

Data Pipeline Installation Quality

  • 1. Pipeline Testing Story IRINA PASHKOVA QA Lead, GreenM
  • 2. Agenda 1. Regression ETL testing 2. Non-functional ETL testing 3. Functional ETL Testing
  • 3. Puppy to Play with Daily Runs Full Refresh Mode 300 Customers ~ 500 Mln rec / tab ~ 5h ETL time
  • 4. Better & Faster – ETL Evolution or Regression ETL Testing
  • 5. ETL Extract Transform Load Operations Storages or DATA SOURCES Reporting Oriented Data Marts or TARGETS
  • 6. New Pipeline Version Regression Testing Non-Functional Testing • Same Sources & Targets • Same Transformation Rules • Previous fully tested version of ETL available
  • 7. Regression via Reference Data Schema • Exclude • Tracking fields • New functionality Data • Clean up Test Schema • Run Smoke suite first SOURCE TESTED TARGET REFERENCE TARGET NEW ETL VERSION PROD ETL VERSION Regression Testing
  • 8. FitNesse for ETL Regression • Config files • Connections • Tab parameters • Fixtures • Non-empty tab • No duplicates • Counts match • Content match Regression Testing
  • 9. FitNesse for ETL Regression Regression Testing
  • 10. Regression Challenges Long run time of ETL Big Data volume Regression Testing Time waste waiting for a fix / change Hang up tests
  • 11. Manual Inspections • Configurations: • Connections • Run mode • Pipeline Steps order & dependencies • Source & Target Tabs • ETL code queries Regression Testing: Challenges
  • 12. Set the Limits! • “Partial” run & Extract re-using • Limit compared data • Set timeout in tests • Model missing data Extract Transform Load Regression Testing: Challenges
  • 13. Take Care about Production Support Group or Non-functional ETL Testing
  • 14. Non-functional Pipeline Testing • Performance • Security • Load/ Stress • Scalability • Usability • Reliability Non-Functional Testing
  • 15. Usability Testing • Easy to • identify current state • find/read Error info • re-configure • Flexible Start • Documentation Non-Functional Testing
  • 16. • Risks assessment • Failure simulation • Volume simulation Reliability Testing Non-Functional Testing
  • 17. Reliability Testing Challenges Hidden Risks Underestimation of severity Dependency on 3d party services Underestimation of probability Communication gaps Non-Functional Testing: Challenges
  • 18. Be Informed! • Monitor Services Logs • Organize Recovery Training • Be specific with to-do’s Non-Functional Testing: Challenges
  • 20. Add Analytics for a New Business Module… please
  • 21. New Data Module Creation or Functional ETL Testing
  • 22. Data Warehouse Testing Extract Transform Load SOURCE TARGET Test Underlying Data Test Data Model Balancing Tests Data Quality Tests Smoke Tests Balancing Tests Balancing Tests
  • 23. Test Underlying Data 1. Gather info – bridge gaps! 2. Break rules that can be broken 3. Draft a Troubleshooting doc Source Area Testing
  • 24. Test Target Data Model 1. Naming convention 2. Optimal base for Visualization 3. Testability checks Data Mart Structure Testing
  • 25. Functional ETL Testing • Smoke Tests • Target Data Quality tests: • Type • Constraint • Data Plausibility • Logical Constraints ! Create similar / relevant tests where applicable for Source to help with further debugging Functional ETL Testing
  • 26. Functional ETL Testing • Balancing Tests: • Study/ Create Specification • Test Minus Queries Assertions via mutated data • Do both-sides comparison Functional ETL Testing
  • 27. Balancing Tests • One all-data storage • AWS Glue & Athena Functional ETL Testing
  • 28. Most Common bugs • Count Mismatch (incl. Duplicates) • Anomalies issues: Null or Length relevant • Date relevant calculations Functional ETL Testing
  • 29. ETL Testing Challenges • Tests Complexity • Unpredictable slow work of AWS Athena • Impossible to check each single record Functional ETL Testing
  • 30. Visualization in Data QA • Source Data Analysis • Target Quality Dashboard • Dedicated resources & Test Results visualization Functional ETL Testing
  • 31. Ongoing Support • Data Integrity Project • Ongoing Logs Analysis • Monitoring Rules & Alarms Testing in Production Data Pipeline
  • 32. Key Takeaways • ETL verification is not that bad • Know your data • Be ready to meet Monsters • Long ETL duration • Big Data Volume • Difference of Test Data from Prod