0% found this document useful (0 votes)
28 views13 pages

ETL Validation

The document outlines various types of ETL (Extract-Transform-Load) testing, emphasizing its importance in ensuring data accuracy, integrity, and quality for business decision-making. It details specific testing methods such as Data Accuracy Testing, Data Integrity Testing, and Data Quality Testing, along with common defects encountered during ETL processes. Additionally, it highlights the significance of ETL testing in the context of data warehousing and business intelligence.

Uploaded by

mangeshmehenge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views13 pages

ETL Validation

The document outlines various types of ETL (Extract-Transform-Load) testing, emphasizing its importance in ensuring data accuracy, integrity, and quality for business decision-making. It details specific testing methods such as Data Accuracy Testing, Data Integrity Testing, and Data Quality Testing, along with common defects encountered during ETL processes. Additionally, it highlights the significance of ETL testing in the context of data warehousing and business intelligence.

Uploaded by

mangeshmehenge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Types of ETL Tests

Data is vital for all businesses to make crucial decisions. ETL


testing plays a great role in verifying, validating, and ensuring that
the business data is exact, consistent, and reliable. ETL Testing is
data-oriented testing, which involves scrutinizing large volumes of
data across heterogeneous data sources. This data-centric testing
helps in achieving good quality data by correcting the inaccuracies
fixed quickly and effectively.
 Data-Centric Testing goes around testing the quality of the data.
The intention of this test is to make sure that valid and accurate
data is in the target. It ensures that proper ETL processes are
applied on source systems and transformed and loaded data in
the target systems. It further assures that proper system migration
and upgrades are performed.
 Data Accuracy Testing ensures that the data is accurately
transformed and loaded to the target as expected. This test helps
in identifying errors raised due to truncation of characters,
incorrect mapping of columns, implementation errors in logic, etc.
 Data Integrity Testing helps in identifying the counts of
‘unspecified’ or ‘unmatched’ rows with reference to foreign keys
and to check the portion of foreign key matches by querying the
data.
 Business Testing ensures that the data fulfills the vital business
necessities. Data is evaluated against the business rules defined.
This test additionally checks whether or not data has been
moved, copied, or loaded fully and accurately.
 Validation Testing (Source to Target Testing) is carried out to
validate whether the data values transformed are the expected
data values.
 Application Upgrades can be auto-generated, saving substantial
test development time. This sort of testing verifies whether or not
the information extracted from an older application or repository is
precisely the same as the data in a repository or the new
application.
 Metadata Testing includes testing of the data type check, data
length check, and the index/constraint check.
 Data Transformation Testing is one of the very crucial tests
done. In several cases, it can’t be achieved by writing a single
SQL query and comparing the output with the target. Multiple SQL
queries may have to be run every row to verify the transformation
rules.
 Data Quality Testing includes syntax and reference tests. To
avoid any error due to date or order number throughout the
business process Data Quality testing is done.
 Syntax Tests will report any dirty data, based on invalid
characters, character pattern, incorrect upper- or lower-case
order, etc.
 Reference Tests can verify that the data is in line with the data
model. For example, Client ID Data quality testing includes
number check, date check, precision check, data check, null
check, etc.
 Incremental ETL testing is done to check the data integrity of
previous and new data throughout the process of adding the
latest data. Incremental testing verifies that the inserts and
updates are being processed as expected in the incremental ETL
process.
Below are a few defects we commonly encounter while doing ETL
testing.
 Missing or data not loaded to the target database
 Incorrect data transformations while data is being loaded to the
target
 Data being lost partially or data truncation
 Data type mismatches
 Handling the NULL values while transforming from Source to
Target systems
 Data duplicates in the target
 Incorrect formatting of the precision values while handing the
numeric data
 Rejected data due to the data issues

Conclusion
With the constantly changing needs of businesses and similar
source systems changes, ETL testing effectively drives continuous
change in the data warehouse schema and the data is loaded.
Hence, it is very important that processes are clearly defined for
both development and testing teams.

ETL Testing or Data Warehouse Testing


Tutorial: What is ETL?
Before we learn anything about ETL Testing concepts, its important to learn
about Business Intelligence and Dataware. Let’s get started – with this ETL
Testing tutorial:

What is BI?
Business Intelligence is the process of collecting raw data or business data
and turning it into information that is useful and more meaningful. The raw
data is the records of the daily transaction of an organization such as
interactions with customers, administration of finance, and management of
employee and so on. These data’s will be used for “Reporting, Analysis,
Data mining, Data quality and Interpretation, Predictive Analysis”.

In this Data Warehouse Testing tutorial, you will learn:

 What is BI?
 Data Warehouse Testing
 What is ETL?
 What is ETL Testing?
 ETL Testing Process
 Types of ETL Testing
 How to Create ETL Test Case
 ETL Test Scenarios and Test Cases
 Types of ETL Bugs
 Difference between Database Testing and ETL Testing
 Responsibilities of an ETL Tester
 Performance Testing in ETL
 Automation of ETL Testing
 Best Practices for ETL Testing

Data Warehouse Testing


Data Warehouse Testing is a testing method in which the data inside a
data warehouse is tested for integrity, reliability, accuracy and consistency
in order to comply with the company’s data framework. The main purpose
of data warehouse testing is to ensure that the integrated data inside the
data warehouse is reliable enough for a company to make decisions on.

What is ETL?
ETL stands for Extract-Transform-Load and it is a process of how data is
loaded from the source system to the data warehouse. Data is extracted
from an OLTP database, transformed to match the data warehouse
schema and loaded into the data warehouse database. Many data
warehouses also incorporate data from non-OLTP systems such as text
files, legacy systems and spreadsheets.

Let see how it works


For example, there is a retail store which has different departments like
sales, marketing, logistics etc. Each of them is handling the customer
information independently, and the way they store that data is quite
different. The sales department have stored it by customer’s name, while
marketing department by customer id.
Now if they want to check the history of the customer and want to know
what the different products he/she bought owing to different marketing
campaigns; it would be very tedious.

The solution is to use a datawarehouse to store information from different


sources in a uniform structure using ETL. ETL can transform dissimilar data
sets into an unified structure.Later use BI tools to derive meaningful
insights and reports from this data.

The following diagram in this ETL testing tutorial gives you the ROAD MAP
of the ETL Testing process flow and various ETL testing concepts:

1. Extract

 Extract relevant data

2. Transform

 Transform data to DW (Data Warehouse) format


 Build keys – A key is one or more data attributes that
uniquely identify an entity. Various types of keys are
primary key, alternate key, foreign key, composite key,
surrogate key. The datawarehouse owns these keys and
never allows any other entity to assign them.
 Cleansing of data :After the data is extracted, it will move
into the next phase, of cleaning and conforming of data.
Cleaning does the omission in the data as well as
identifying and fixing the errors. Conforming means
resolving the conflicts between those data’s that is
incompatible, so that they can be used in an enterprise
data warehouse. In addition to these, this system creates
meta-data that is used to diagnose source system
problems and improves data quality.

3. Load

 Load data into DW (Data Warehouse)


 Build aggregates – Creating an aggregate is
summarizing and storing data which is available in fact
table in order to improve the performance of end-user
queries.

What is ETL Testing?


ETL testing is done to ensure that the data that has been loaded from a
source to the destination after business transformation is accurate. It also
involves the verification of data at various middle stages that are being
used between source and destination. ETL stands for Extract-Transform-
Load.

ETL Testing Process


Similar to other Testing Process, ETL also go through different phases. The
different phases of ETL testing process is as follows

ETL testing is performed in five stages

1. Identifying data sources and requirements


2. Data acquisition
3. Implement business logics and dimensional Modelling
4. Build and populate data
5. Build Reports

Types of ETL Testing


Types Of Testing Testing Process

“Table balancing” or “production reconciliation” this type of ETL testing is done


on data as it is being moved into production systems. To support your business
Production decision, the data in your production systems has to be in the correct
Validation Testing order. Informatica Data Validation Option provides the ETL testing automation
and management capabilities to ensure that production systems are not
compromised by the data.

Source to Target
Such type of testing is carried out to validate whether the data values
Testing (Validation
transformed are the expected data values.
Testing)

Such type of ETL testing can be automatically generated, saving substantial test
Application development time. This type of testing checks whether the data extracted from
Upgrades an older application or repository are exactly same as the data in a repository
or new application.

Metadata testing includes testing of data type check, data length check and
Metadata Testing
index/constraint check.
Types Of Testing Testing Process

To verify that all the expected data is loaded in target from the source, data
Data Completeness completeness testing is done. Some of the tests that can be run are compare
Testing and validate counts, aggregates and actual data between the source and target
for columns with simple transformation or no transformation.

Data Accuracy This testing is done to ensure that the data is accurately loaded and
Testing transformed as expected.

Testing data transformation is done as in many cases it cannot be achieved by


Data
writing one source SQL query and comparing the output with the target.
Transformation
Multiple SQL queries may need to be run for each row to verify the
Testing
transformation rules.

Data Quality Tests includes syntax and reference tests. In order to avoid any
error due to date or order number during business process Data Quality testing
is done.

Syntax Tests: It will report dirty data, based on invalid characters,


Data Quality Testing character pattern, incorrect upper or lower case order etc.

Reference Tests: It will check the data according to the data model. For
example: Customer ID

Data quality testing includes number check, date check, precision check,
data check , null check etc.
This testing is done to check the data integrity of old and new data with the
Incremental ETL
addition of new data. Incremental testing verifies that the inserts and updates
testing
are getting processed as expected during incremental ETL process.

GUI/Navigation This testing is done to check the navigation or GUI aspects of the front end
Testing reports.

How to Create ETL Test Case


ETL testing is a concept which can be applied to different tools and
databases in information management industry. The objective of ETL
testing is to assure that the data that has been loaded from a source
to destination after business transformation is accurate. It also
involves the verification of data at various middle stages that are being
used between source and destination.

While performing ETL testing, two documents that will always be used by
an ETL tester are

1. ETL mapping sheets :An ETL mapping sheets contain all


the information of source and destination tables including
each and every column and their look-up in reference
tables. An ETL testers need to be comfortable with SQL
queries as ETL testing may involve writing big queries with
multiple joins to validate data at any stage of ETL. ETL
mapping sheets provide a significant help while writing
queries for data verification.
2. DB Schema of Source, Target: It should be kept handy to
verify any detail in mapping sheets.

ETL Test Scenarios and Test Cases


Test Scenario Test Cases

Mapping doc Verify mapping doc whether corresponding ETL information is provided or not.
validation Change log should maintain in every mapping doc.

1. Validate the source and target table structure against corresponding


mapping doc.
2. Source data type and target data type should be same
Validation 3. Length of data types in both source and target should be equal
4. Verify that data field types and formats are specified
5. Source data type length should not less than the target data type length
6. Validate the name of columns in the table against mapping doc.

Constraint Ensure the constraints are defined for specific table as expected
Validation
1. The data type and length for a particular attribute may vary in files or
Data consistency tables though the semantic definition is the same.
issues 2. Misuse of integrity constraints

1. Ensure that all expected data is loaded into target table.


2. Compare record counts between source and target.
3. Check for any rejected records
Completeness 4. Check data should not be truncated in the column of target tables
Issues 5. Check boundary value analysis
6. Compares unique values of key fields between data loaded to WH and
source data

1. Data that is misspelled or inaccurately recorded


Correctness Issues 2. Null, non-unique or out of range data

Transformation Transformation

1. Number check: Need to number check and validate it


2. Date Check: They have to follow date format and it should be same
across all records
Data Quality 3. Precision Check
4. Data check
5. Null check
Test Scenario Test Cases

Null Validate Verify the null values, where “Not Null” specified for a specific column.

1. Needs to validate the unique key, primary key and any other column
should be unique as per the business requirements are having any
duplicate rows
Duplicate Check 2. Check if any duplicate values exist in any column which is extracting
from multiple columns in source and combining into one column
3. As per the client requirements, needs to be ensure that no duplicates in
combination of multiple columns within target only

Date values are using many areas in ETL development for

1. To know the row creation date


2. Identify active records as per the ETL development perspective
Date Validation
3. Identify active records as per the business requirements perspective
4. Sometimes based on the date values the updates and inserts are
generated.

1. To validate the complete data set in source and target table minus a
query in a best solution
2. We need to source minus target and target minus source
3. If minus query returns any value those should be considered as
mismatching rows
4. Needs to matching rows among source and target using intersect
Complete Data
statement
Validation
5. The count returned by intersect should match with individual counts of
source and target tables
6. If minus query returns of rows and count intersect is less than source
count or target table then we can consider as duplicate rows are
existed.

Data Cleanness Unnecessary columns should be deleted before loading into the staging area.
Types of ETL Bugs

Type of Bugs Description

 Related to GUI of application


User interface bugs/cosmetic bugs  Font style, font size, colors, alignment, spelling
mistakes, navigation and so on

Boundary Value Analysis (BVA)  Minimum and maximum values


related bug
Equivalence Class Partitioning (ECP)  Valid and invalid type
related bug
 Valid values not accepted
Input/Output bugs  Invalid values accepted

 Mathematical errors
Calculation bugs  Final output is wrong

 Does not allows multiple users


Load Condition bugs  Does not allows customer expected load

Race Condition bugs  System crash & hang


Type of Bugs Description

 System cannot run client platforms

 No logo matching
 No version information available
Version control bugs
 This occurs usually in Regression Testing

 Device is not responding to the application


H/W bugs

 Mistakes in help documents


Help Source bugs

Difference between Database Testing and ETL Testing


ETL Testing Data Base Testing

The primary goal is to check if the data is following the


Verifies whether data is moved as expected
rules/ standards defined in the Data Model

Verifies whether counts in the source and


target are matching Verify that there are no orphan records and foreign-
primary key relations are maintained
Verifies whether the data transformed is
as per expectation
Verifies that the foreign primary key Verifies that there are no redundant tables and
relations are preserved during the ETL database is optimally normalized

Verifies for duplication in loaded data Verify if data is missing in columns where required

Responsibilities of an ETL Tester


Key responsibilities of an ETL tester are segregated into three categories

 Stage table/ SFS or MFS


 Business transformation logic applied
 Target table loading from stage file or table after applying a
transformation.

Some of the responsibilities of an ETL tester are

 Test ETL software


 Test components of ETL datawarehouse
 Execute backend data-driven test
 Create, design and execute test cases, test plans and test harness
 Identify the problem and provide solutions for potential issues
 Approve requirements and design specifications
 Data transfers and Test flat file
 Writing SQL queries3 for various scenarios like count test

Performance Testing in ETL


Performance Testing in ETL is a testing technique to ensure that an ETL
system can handle load of multiple users and transactions. The primary
goal of ETL Performance Testing is to optimize and improve session
performance by identification and elimination of performance bottlenecks.
The source and target databases, mappings, sessions and the system
possibly have performance bottlenecks.
One of the best tools used for Performance Testing/Tuning is Informatica.

Automation of ETL Testing


The general methodology of ETL testing is to use SQL scripting or do
“eyeballing” of data.. These approaches to ETL testing are time-consuming,
error-prone and seldom provide completetest coverage. To accelerate,
improve coverage, reduce costs, improve Defect detection ration of ETL
testing in production and development environments, automation is the
need of the hour. One such tool is Informatica.

Best Practices for ETL Testing


1. Make sure data is transformed correctly
2. Without any data loss and truncation projected data should be loaded
into the data warehouse
3. Ensure that ETL application appropriately rejects and replaces with
default values and reports invalid data
4. Need to ensure that the data loaded in data warehouse within
prescribed and expected time frames to confirm scalability and
performance
5. All methods should have appropriate unit tests regardless of visibility
6. To measure their effectiveness all unit tests should use appropriate
coverage techniques
7. Strive for one assertion per test case
8. Create unit tests that target exceptions

You might also like