Data Warehouse Testing
By : Kartikey Brahmkshatriya (M.C.A)
Index ...............................................................................................................................................2 Introduction.............................................................................................................................3 About Data Warehouse...........................................................................................................3 Data Warehouse definition.................................................................................................3 Testing Process for Data warehouse:......................................................................................3 Requirements Testing :......................................................................................................3 Unit Testing : ....................................................................................................................4 Integration Testing : ..........................................................................................................4 Scenarios to be covered in Integration Testing...............................................................5 Validating the Report data..............................................................................................6 User Acceptance Testing....................................................................................................6 Conclusion..............................................................................................................................6 ETL Testing Techniques:................................................................................................6 ETL Testing Process:......................................................................................................7 ETL Testing Challenges:................................................................................................8
Introduction This document details the testing process involved in data warehouse testing and test coverage areas. It explains the importance of data warehouse application testing and the various steps of the testing process. About Data Warehouse Data warehouse is the main repository of the organization's historical data. It contains the data for management's decision support system. The important factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis (data mining) on the information within data warehouse without slowing down the operational systems. Data Warehouse definition Subject-oriented : Subject Oriented -Data warehouses are designed to help you analyse data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented. The data is organized so that all the data elements relating to the same realworld event or object are linked together. Integrated : Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. The database contains data from most or all of an organization's operational applications and is made consistent. Time-variant : The changes to the data in the database are tracked and recorded to produce reports on data changed over time. In order to discover trends in business, analysts need large amounts of data. A data warehouse's focus on change over time is what is meant by the term time variant. Non-volatile : Data in the database is never over-written or deleted, once committed, the data is static, read-only, but retained for future reporting. Once entered into the warehouse, data should not change. This is logical because the purpose of a data warehouse is to enable you to analyse what has occurred.
Testing Process for Data warehouse: Testing for a Data warehouse consists of requirements testing, unit testing, integration testing and acceptance testing.
Requirements Testing : The main aim for doing Requirements testing is to check stated requirements for completeness. Requirements can be tested on following factors. 1. 2. 3. 4. 5. Are Are Are Are Are the the the the the requirements requirements requirements requirements requirements Complete? Singular? Ambiguous? Developable? Testable?
In a Data warehouse, the requirements are mostly around reporting. Hence it becomes more important to verify whether these reporting requirements can be catered using the data available. Successful requirements are those structured closely to business rules and address functionality and performance. These business rules and requirements provide a solid
foundation to the data architects. Using the defined requirements and business rules, high level design of the data model is created. Once requirements and business rules are available, rough scripts can be drafted to validate the data model constraints against the defined business rules. Unit Testing : Unit testing for data warehouses is WHITEBOX. It should check the ETL procedures/mappings/jobs and the reports developed. This is usually done by the developers. Unit testing will involve following 1. 2. 3. Whether ETLs are accessing and picking up right data from right source. All the data transformations are correct according to the business rules and data warehouse is correctly populated with the transformed data. Testing the rejected records that dont fulfil transformation rules.
Integration Testing : After unit testing is complete, it should form the basis of starting integration testing. Integration testing should test out initial and incremental loading of the data warehouse. Integration testing will involve following 1. 2. 3. 4. 5. Sequence of ETLs jobs in batch. Initial loading of records on data warehouse. Incremental loading of records at a later date to verify the newly inserted or updated data. Testing the rejected records that dont fulfil transformation rules. Error log generation.
The overall Integration testing life cycle executed is planned in four phases: Requirements Understanding, Test Planning and Design, Test Case Preparation and Test Execution.
Business Business Requirement Requirement Document/Requirement Document/Requirement Traceability Traceability Matrix Matrix
High High Level Level Design Design document document
QA Team Reviews BRD for completeness. QA Team builds Test Plan
Requirements Requirements Testing Testing Review Review of of HLD HLD
Develop Test Cases and SQL Queries
Test Test Case Case Preparation Preparation
Unit Unit Testing Testing Functional Functional Testing Testing
Test Execution
Regression Regression Testing Testing Performance Performance Testing Testing
User User Acceptance Acceptance Testing Testing (UAT) (UAT)
Process for Data warehouse Testing
Scenarios to be covered in Integration Testing Integration Testing would cover End-to-End Testing for DWH. The coverage of the tests would include the below: 1. Count Validation - Record Count Verification DWH backend/Reporting queries against source and target as a initial check. Source Isolation - Validation after isolating the driving sources. Dimensional Analysis - Data integrity between the various source tables and relationships. Statistical Analysis - Validation for various calculations. Data Quality Validation - Check for missing data, negatives and consistency. Field-by-Field data verification can be done to check the consistency of source and target data. Granularity - Validate at the lowest granular level possible (Lowest in the hierarchy E.g. Country-City-Street start with test cases on street). Other validations - Graphs, Slice/dice, meaningfulness, accuracy
2. 3. 4. 5.
6.
7. .
Validating the Report data Once the ETLs are tested for count and data verification, the data being showed onto the reports hold utmost importance. QA team should verify the data reported with the source data for consistency and accuracy. 1. Verify Report data with source - Although the data present in a data warehouse will be stored at an aggregate level compare to source systems. Here the QA team should verify the granular data stored in data warehouse against the source data available. Field level data verification - QA team must understand the linkages for the fields displayed in the report and should trace back and compare that with the source systems. Creating SQLs - Create SQL queries to fetch and verify the data from Source and Target. Sometimes its not possible to do the complex transformations done in ETL. In such a case the data can be transferred to some file and calculations can be performed.
2.
3.
User Acceptance Testing Here the system is tested with full functionality and is expected to function as in production. At the end of UAT, the system should be acceptable to the client for use in terms of ETL process integrity and business functionality and reporting.
Conclusion Evolving needs of the business and changes in the source systems will drive continuous change in the data warehouse schema and the data being loaded. Hence, it is necessary that development and testing processes are clearly defined, followed by impact-analysis and strong alignment between development, operations and the business.
ETL or Data warehouse testing is categorized into four different engagements irrespective of technology or ETL tools used: New Data Warehouse Testing New DW is built and verified from scratch. Data input is taken from customer requirements and different data sources and new data warehouse is build and verified with the help of ETL tools. Migration Testing In this type of project customer will have an existing DW and ETL performing the job but they are looking to bag new tool in order to improve efficiency. Change Request In this type of project new data is added from different sources to an existing DW. Also, there might be a condition where customer needs to change their existing business rule or they might integrate the new rule. Report Testing Report are the end result of any Data Warehouse and the basic propose for which DW is build. Report must be tested by validating layout, data in the report and calculation. ETL Testing Techniques: 1) Verify that data is transformed correctly according to various business requirements and rules. 2) Make sure that all projected data is loaded into the data warehouse without any data loss and truncation.
3) Make sure that ETL application appropriately rejects, replaces with default values and reports invalid data. 4) Make sure that data is loaded in data warehouse within prescribed and expected time frames to confirm improved performance and scalability. Apart from these 4 main ETL testing methods other testing methods like integration testing and user acceptance testing is also carried out to make sure everything is smooth and reliable. ETL Testing Process: Similar to any other testing that lies under Independent Verification and Validation, ETL also go through the same phase. Business and requirement understanding Validating Test Estimation Test planning based on the inputs from test estimation and business requirement Designing test cases and test scenarios from all the available inputs Once all the test cases are ready and are approved, testing team proceed to perform pre-execution check and test data preparation for testing Lastly execution is performed till exit criteria are met Upon successful completion summary report is prepared and closure process is done. It is necessary to define test strategy which should be mutually accepted by stakeholders before starting actual testing. A well defined test strategy will make sure that correct approach has been followed meeting the testing aspiration. ETL testing might require writing SQL statements extensively by testing team or may be tailoring the SQL provided by development team. In any case testing team must be aware of the results they are trying to get using those SQL statements.
Difference between Database and Data Warehouse Testing There is a popular misunderstanding that database testing and data warehouse is similar while the fact is that both hold different direction in testing. Database testing is done using smaller scale of data normally with OLTP (Online transaction processing) type of databases while data warehouse testing is done with large volume with data involving OLAP (online analytical processing) databases. In database testing normally data is consistently injected from uniform sources while in data warehouse testing most of the data comes from different kind of data sources which are sequentially inconsistent.
We generally perform only CRUD (Create, read, update and delete) operation in database testing while in data warehouse testing we use read-only (Select) operation. Normalized databases are used in DB testing while demoralized DB is used in data warehouse testing. There are number of universal verifications that have to be carried out for any kind of data warehouse testing. Below is the list of objects that are treated as essential for validation in ETL testing: - Verify that data transformation from source to destination works as expected - Verify that expected data is added in target system - Verify that all DB fields and field data is loaded without any truncation - Verify data checksum for record count match - Verify that for rejected data proper error logs are generated with all details - Verify NULL value fields - Verify that duplicate data is not loaded - Verify data integrity
ETL Testing Challenges: ETL testing is quite different from conventional testing. There are many challenges we faced while performing data warehouse testing. Here is the list of few ETL testing challenges I experienced on my project: - Incompatible and duplicate data. - Loss of data during ETL process. - Unavailability of inclusive test bed. - Testers have no privileges to execute ETL jobs by their own. - Volume and complexity of data is very huge. - Fault in business process and procedures. - Trouble acquiring and building test data. - Missing business flow information. Data is important for businesses to make the critical business decisions. ETL testing plays a significant role validating and ensuring that the business information is exact, consistent and reliable. Also, it minimizes hazard of data loss in production.