Types of ETL Tests
Data is vital for all businesses to make crucial decisions. ETL
 testing plays a great role in verifying, validating, and ensuring that
 the business data is exact, consistent, and reliable. ETL Testing is
 data-oriented testing, which involves scrutinizing large volumes of
 data across heterogeneous data sources. This data-centric testing
 helps in achieving good quality data by correcting the inaccuracies
 fixed quickly and effectively.
 Data-Centric Testing goes around testing the quality of the data.
    The intention of this test is to make sure that valid and accurate
    data is in the target. It ensures that proper ETL processes are
    applied on source systems and transformed and loaded data in
    the target systems. It further assures that proper system migration
    and upgrades are performed.
 Data Accuracy Testing ensures that the data is accurately
    transformed and loaded to the target as expected. This test helps
    in identifying errors raised due to truncation of characters,
    incorrect mapping of columns, implementation errors in logic, etc.
 Data Integrity Testing helps in identifying the counts of
    ‘unspecified’ or ‘unmatched’ rows with reference to foreign keys
    and to check the portion of foreign key matches by querying the
    data.
 Business Testing ensures that the data fulfills the vital business
    necessities. Data is evaluated against the business rules defined.
    This test additionally checks whether or not data has been
    moved, copied, or loaded fully and accurately.
 Validation Testing (Source to Target Testing) is carried out to
    validate whether the data values transformed are the expected
    data values.
 Application Upgrades can be auto-generated, saving substantial
    test development time. This sort of testing verifies whether or not
    the information extracted from an older application or repository is
    precisely the same as the data in a repository or the new
    application.
  Metadata Testing includes testing of the data type check, data
   length check, and the index/constraint check.
 Data Transformation Testing is one of the very crucial tests
   done. In several cases, it can’t be achieved by writing a single
   SQL query and comparing the output with the target. Multiple SQL
   queries may have to be run every row to verify the transformation
   rules.
 Data Quality Testing includes syntax and reference tests. To
   avoid any error due to date or order number throughout the
   business process Data Quality testing is done.
 Syntax Tests will report any dirty data, based on invalid
   characters, character pattern, incorrect upper- or lower-case
   order, etc.
 Reference Tests can verify that the data is in line with the data
   model. For example, Client ID Data quality testing includes
   number check, date check, precision check, data check, null
   check, etc.
 Incremental ETL testing is done to check the data integrity of
   previous and new data throughout the process of adding the
   latest data. Incremental testing verifies that the inserts and
   updates are being processed as expected in the incremental ETL
   process.
 Below are a few defects we commonly encounter while doing ETL
 testing.
   Missing or data not loaded to the target database
   Incorrect data transformations while data is being loaded to the
    target
   Data being lost partially or data truncation
   Data type mismatches
   Handling the NULL values while transforming from Source to
    Target systems
   Data duplicates in the target
     Incorrect formatting of the precision values while handing the
      numeric data
     Rejected data due to the data issues
    Conclusion
    With the constantly changing needs of businesses and similar
    source systems changes, ETL testing effectively drives continuous
    change in the data warehouse schema and the data is loaded.
    Hence, it is very important that processes are clearly defined for
    both development and testing teams.
    ETL Testing or Data Warehouse Testing
    Tutorial: What is ETL?
    Before we learn anything about ETL Testing concepts, its important to learn
    about Business Intelligence and Dataware. Let’s get started – with this ETL
    Testing tutorial:
    What is BI?
    Business Intelligence is the process of collecting raw data or business data
    and turning it into information that is useful and more meaningful. The raw
    data is the records of the daily transaction of an organization such as
    interactions with customers, administration of finance, and management of
    employee and so on. These data’s will be used for “Reporting, Analysis,
    Data mining, Data quality and Interpretation, Predictive Analysis”.
    In this Data Warehouse Testing tutorial, you will learn:
          What is BI?
          Data Warehouse Testing
          What is ETL?
          What is ETL Testing?
          ETL Testing Process
          Types of ETL Testing
          How to Create ETL Test Case
          ETL Test Scenarios and Test Cases
          Types of ETL Bugs
          Difference between Database Testing and ETL Testing
          Responsibilities of an ETL Tester
      Performance Testing in ETL
      Automation of ETL Testing
      Best Practices for ETL Testing
Data Warehouse Testing
Data Warehouse Testing is a testing method in which the data inside a
data warehouse is tested for integrity, reliability, accuracy and consistency
in order to comply with the company’s data framework. The main purpose
of data warehouse testing is to ensure that the integrated data inside the
data warehouse is reliable enough for a company to make decisions on.
What is ETL?
ETL stands for Extract-Transform-Load and it is a process of how data is
loaded from the source system to the data warehouse. Data is extracted
from an OLTP database, transformed to match the data warehouse
schema and loaded into the data warehouse database. Many data
warehouses also incorporate data from non-OLTP systems such as text
files, legacy systems and spreadsheets.
Let see how it works
For example, there is a retail store which has different departments like
sales, marketing, logistics etc. Each of them is handling the customer
information independently, and the way they store that data is quite
different. The sales department have stored it by customer’s name, while
marketing department by customer id.
Now if they want to check the history of the customer and want to know
what the different products he/she bought owing to different marketing
campaigns; it would be very tedious.
The solution is to use a datawarehouse to store information from different
sources in a uniform structure using ETL. ETL can transform dissimilar data
sets into an unified structure.Later use BI tools to derive meaningful
insights and reports from this data.
The following diagram in this ETL testing tutorial gives you the ROAD MAP
of the ETL Testing process flow and various ETL testing concepts:
  1. Extract
                  Extract relevant data
  2. Transform
                  Transform data to DW (Data Warehouse) format
                  Build keys – A key is one or more data attributes that
                   uniquely identify an entity. Various types of keys are
                   primary key, alternate key, foreign key, composite key,
                   surrogate key. The datawarehouse owns these keys and
                   never allows any other entity to assign them.
                  Cleansing of data :After the data is extracted, it will move
                   into the next phase, of cleaning and conforming of data.
                   Cleaning does the omission in the data as well as
                   identifying and fixing the errors. Conforming means
                   resolving the conflicts between those data’s that is
                   incompatible, so that they can be used in an enterprise
                   data warehouse. In addition to these, this system creates
                   meta-data that is used to diagnose source system
                   problems and improves data quality.
   3. Load
                  Load data into DW (Data Warehouse)
                  Build aggregates – Creating an aggregate is
                   summarizing and storing data which is available in fact
                   table in order to improve the performance of end-user
                   queries.
What is ETL Testing?
ETL testing is done to ensure that the data that has been loaded from a
source to the destination after business transformation is accurate. It also
involves the verification of data at various middle stages that are being
used between source and destination. ETL stands for Extract-Transform-
Load.
ETL Testing Process
Similar to other Testing Process, ETL also go through different phases. The
different phases of ETL testing process is as follows
ETL testing is performed in five stages
   1. Identifying data sources and requirements
   2. Data acquisition
   3. Implement business logics and dimensional Modelling
   4. Build and populate data
   5. Build Reports
Types of ETL Testing
 Types Of Testing                                    Testing Process
                     “Table balancing” or “production reconciliation” this type of ETL testing is done
                     on data as it is being moved into production systems. To support your business
Production           decision, the data in your production systems has to be in the correct
Validation Testing   order. Informatica Data Validation Option provides the ETL testing automation
                     and management capabilities to ensure that production systems are not
                     compromised by the data.
Source to Target
                    Such type of testing is carried out to validate whether the data values
Testing (Validation
                    transformed are the expected data values.
Testing)
                     Such type of ETL testing can be automatically generated, saving substantial test
Application          development time. This type of testing checks whether the data extracted from
Upgrades             an older application or repository are exactly same as the data in a repository
                     or new application.
                     Metadata testing includes testing of data type check, data length check and
Metadata Testing
                     index/constraint check.
 Types Of Testing                                   Testing Process
                  To verify that all the expected data is loaded in target from the source, data
Data Completeness completeness testing is done. Some of the tests that can be run are compare
Testing           and validate counts, aggregates and actual data between the source and target
                  for columns with simple transformation or no transformation.
Data Accuracy       This testing is done to ensure that the data is accurately loaded and
Testing             transformed as expected.
                    Testing data transformation is done as in many cases it cannot be achieved by
Data
                    writing one source SQL query and comparing the output with the target.
Transformation
                    Multiple SQL queries may need to be run for each row to verify the
Testing
                    transformation rules.
                    Data Quality Tests includes syntax and reference tests. In order to avoid any
                    error due to date or order number during business process Data Quality testing
                    is done.
                     Syntax Tests: It will report dirty data, based on invalid characters,
Data Quality Testing character pattern, incorrect upper or lower case order etc.
                    Reference Tests: It will check the data according to the data model. For
                    example: Customer ID
                    Data quality testing includes number check, date check, precision check,
                    data check , null check etc.
                    This testing is done to check the data integrity of old and new data with the
Incremental ETL
                    addition of new data. Incremental testing verifies that the inserts and updates
testing
                    are getting processed as expected during incremental ETL process.
GUI/Navigation      This testing is done to check the navigation or GUI aspects of the front end
Testing             reports.
How to Create ETL Test Case
ETL testing is a concept which can be applied to different tools and
databases in information management industry. The objective of ETL
testing is to assure that the data that has been loaded from a source
to destination after business transformation is accurate. It also
involves the verification of data at various middle stages that are being
used between source and destination.
While performing ETL testing, two documents that will always be used by
an ETL tester are
                 1. ETL mapping sheets :An ETL mapping sheets contain all
                    the information of source and destination tables including
                    each and every column and their look-up in reference
                    tables. An ETL testers need to be comfortable with SQL
                    queries as ETL testing may involve writing big queries with
                    multiple joins to validate data at any stage of ETL. ETL
                    mapping sheets provide a significant help while writing
                    queries for data verification.
                 2. DB Schema of Source, Target: It should be kept handy to
                    verify any detail in mapping sheets.
ETL Test Scenarios and Test Cases
   Test Scenario                                        Test Cases
Mapping doc          Verify mapping doc whether corresponding ETL information is provided or not.
validation           Change log should maintain in every mapping doc.
                         1. Validate the source and target table structure against corresponding
                            mapping doc.
                         2. Source data type and target data type should be same
Validation               3. Length of data types in both source and target should be equal
                         4. Verify that data field types and formats are specified
                         5. Source data type length should not less than the target data type length
                         6. Validate the name of columns in the table against mapping doc.
Constraint           Ensure the constraints are defined for specific table as expected
Validation
                         1. The data type and length for a particular attribute may vary in files or
Data consistency            tables though the semantic definition is the same.
issues                   2. Misuse of integrity constraints
                         1.   Ensure that all expected data is loaded into target table.
                         2.   Compare record counts between source and target.
                         3.   Check for any rejected records
Completeness             4.   Check data should not be truncated in the column of target tables
Issues                   5.   Check boundary value analysis
                         6.   Compares unique values of key fields between data loaded to WH and
                              source data
                         1. Data that is misspelled or inaccurately recorded
Correctness Issues       2. Null, non-unique or out of range data
Transformation       Transformation
                         1. Number check: Need to number check and validate it
                         2. Date Check: They have to follow date format and it should be same
                            across all records
Data Quality             3. Precision Check
                         4. Data check
                         5. Null check
   Test Scenario                                       Test Cases
Null Validate      Verify the null values, where “Not Null” specified for a specific column.
                       1. Needs to validate the unique key, primary key and any other column
                          should be unique as per the business requirements are having any
                          duplicate rows
Duplicate Check        2. Check if any duplicate values exist in any column which is extracting
                          from multiple columns in source and combining into one column
                       3. As per the client requirements, needs to be ensure that no duplicates in
                          combination of multiple columns within target only
                   Date values are using many areas in ETL development for
                       1.   To know the row creation date
                       2.   Identify active records as per the ETL development perspective
Date Validation
                       3.   Identify active records as per the business requirements perspective
                       4.   Sometimes based on the date values the updates and inserts are
                            generated.
                       1. To validate the complete data set in source and target table minus a
                          query in a best solution
                       2. We need to source minus target and target minus source
                       3. If minus query returns any value those should be considered as
                          mismatching rows
                       4. Needs to matching rows among source and target using intersect
Complete Data
                          statement
Validation
                       5. The count returned by intersect should match with individual counts of
                          source and target tables
                       6. If minus query returns of rows and count intersect is less than source
                          count or target table then we can consider as duplicate rows are
                          existed.
Data Cleanness     Unnecessary columns should be deleted before loading into the staging area.
Types of ETL Bugs
           Type of Bugs                                       Description
                                          Related to GUI of application
User interface bugs/cosmetic bugs         Font style, font size, colors, alignment, spelling
                                           mistakes, navigation and so on
Boundary Value Analysis (BVA)             Minimum and maximum values
related bug
Equivalence Class Partitioning (ECP)      Valid and invalid type
related bug
                                          Valid values not accepted
Input/Output bugs                         Invalid values accepted
                                          Mathematical errors
Calculation bugs                          Final output is wrong
                                          Does not allows multiple users
Load Condition bugs                       Does not allows customer expected load
Race Condition bugs                       System crash & hang
            Type of Bugs                                        Description
                                             System cannot run client platforms
                                             No logo matching
                                             No version information available
Version control bugs
                                             This occurs usually in Regression Testing
                                             Device is not responding to the application
H/W bugs
                                             Mistakes in help documents
Help Source bugs
Difference between Database Testing and ETL Testing
                 ETL Testing                                     Data Base Testing
                                              The primary goal is to check if the data is following the
Verifies whether data is moved as expected
                                              rules/ standards defined in the Data Model
Verifies whether counts in the source and
target are matching                           Verify that there are no orphan records and foreign-
                                              primary key relations are maintained
Verifies whether the data transformed is
as per expectation
Verifies that the foreign primary key         Verifies that there are no redundant tables and
relations are preserved during the ETL        database is optimally normalized
Verifies for duplication in loaded data       Verify if data is missing in columns where required
Responsibilities of an ETL Tester
Key responsibilities of an ETL tester are segregated into three categories
       Stage table/ SFS or MFS
       Business transformation logic applied
       Target table loading from stage file or table after applying a
        transformation.
Some of the responsibilities of an ETL tester are
       Test ETL software
       Test components of ETL datawarehouse
       Execute backend data-driven test
       Create, design and execute test cases, test plans and test harness
       Identify the problem and provide solutions for potential issues
       Approve requirements and design specifications
       Data transfers and Test flat file
     Writing SQL queries3 for various scenarios like count test
Performance Testing in ETL
Performance Testing in ETL is a testing technique to ensure that an ETL
system can handle load of multiple users and transactions. The primary
goal of ETL Performance Testing is to optimize and improve session
performance by identification and elimination of performance bottlenecks.
The source and target databases, mappings, sessions and the system
possibly have performance bottlenecks.
One of the best tools used for Performance Testing/Tuning is Informatica.
Automation of ETL Testing
The general methodology of ETL testing is to use SQL scripting or do
“eyeballing” of data.. These approaches to ETL testing are time-consuming,
error-prone and seldom provide completetest coverage. To accelerate,
improve coverage, reduce costs, improve Defect detection ration of ETL
testing in production and development environments, automation is the
need of the hour. One such tool is Informatica.
Best Practices for ETL Testing
  1. Make sure data is transformed correctly
  2. Without any data loss and truncation projected data should be loaded
     into the data warehouse
  3. Ensure that ETL application appropriately rejects and replaces with
     default values and reports invalid data
  4. Need to ensure that the data loaded in data warehouse within
     prescribed and expected time frames to confirm scalability and
     performance
  5. All methods should have appropriate unit tests regardless of visibility
  6. To measure their effectiveness all unit tests should use appropriate
     coverage techniques
  7. Strive for one assertion per test case
  8. Create unit tests that target exceptions