How to Test Your Data Warehouse


To start with, we need a Test schedule. The same is created in the process of developing the Test plan. In this schedule, we have to estimate the time required for testing of the entire Data Warehouse system. There are different methodologies available to create a Test schedule. None of them are perfect because the data warehouse ecosystem is very complex and large, also constantly evolving in nature. The most important takeaway from this article is that DW testing is data centric, while software testing is code centric. The connections between the DW components are groups of transformations that take place over data. These transformation processes should be tested as well, to ensure data quality preservation. The DW testing and validation techniques I introduce here are broken into four well defined processes, namely:

1. Integration testing
2. System testing
3. Data validation
4. Acceptance testing

Thus these types of testing are again divided in the following sub-phases:

  • Understanding of Requirements
  • Development of Test Plan and Test Design
  • Preparation of Test Cases
  • Execution of Test Cases
  • Reporting of the Test results

As you can see, the first major phase is the Requirements Definition. Data used in a Data Warehouse originates from multiple source systems. Therefore, defining good requirements can be extremely challenging. Successful requirements are those structured closely to business rules and in the same time address functionality and performance. These business rules and requirements provide a solid foundation to the data architects. Requirements are one of the keys to success.

The whole team should share the same tools from the project toolbox. It’s important that everyone works from the same set of documents and requirements (i.e. version of files). We have to create a rich set of test data, avoiding combinational explosion of sets. The automation approach should be a combination of Task automation and Test automation. An alternative strategy is to use the source systems to create the artificial test data. That way a realistic but artificial source data is created. The testers should be skilled as well.

Working with the concept of the Prototype model, the testing activities can be summarized in the form of the following checklist:

1. Multidimensional Schema

  • Workload Test
  • Hierarchy Test
  • Conformity Test
  • Nomenclature check
  • Performance Test
  • Early loading Test
  • Security Test
  • Maintainability Test

2. ETL Procedures

  • Code Test
  • Integrity Test
  • Integration Test
  • Administrability Test
  • Performance/Stress Test
  • Recovery Test
  • Security Test
  • Maintainability Test

3. Physical Schema

  • Performance/Stress Test
  • Recovery Test
  • Security Test

4. Front-end

  • Balancing Test
  • Usability Test
  • Performance/Stress Test
  • Security Test

In order to achieve our testing goals, we could focus on the following two approaches:

1. Manual Sampling – The testing activities covered under this approach are:
End-to-End Testing – It checks that data is properly loaded into the systems from which the data warehouse will extract data to generate Reports.
Row count Testing – To avoid any loss of data, all rows of data are counted after the ETL process to ensure that all the data is properly loaded.
Field size Testing – It checks that the data warehouse field should be bigger than the data field for the data being loaded. If it is not checked it will lead to data truncation.
Sampling – The sample used for testing must be a good representation of whole data.

2. Reporting – The testing activities covered under this approach are:
Report Testing – The reports are checked to see that the data displayed in the reports are correct and can be used for decision-making purpose.
Next major part of our testing is to choose a Test fixture strategy. I prefer to use Fresh fixture – ensuring that tests do not depend on anything they did not set up themselves. Combined with Lazy (Initialization) setup and Shared fixture (partitioning the fixture required by tests into two logical parts). That is assuming that we are happy with the idea of creating the test fixture the first time any test needs it, we can use Lazy Setup in the setUp() method of the corresponding Testcase Class to create it as part of running the first test. Subsequent tests will then see that the fixture already exists and reuse it.

The first part is the stuff every test needs to have present but is never modified by any tests—that is, the Immutable Shared Fixture. The second part is the objects that any test needs to modify or
delete; these objects should be built by each test as Fresh Fixtures. Most commonly, the Immutable Shared Fixture consists of reference data that is needed by the actual per-test fixtures. The per-test fixtures can then be built as Fresh Fixtures on top of the Immutable Shared Fixture). We also have the option called Minimal Fixture (i.e. use of smallest and simplest fixture possible for each test). Aiming at full post I have to mention the last option I would use, although it’s considered as Anti-pattern, is called Chained Tests (i.e. let the other tests in a test suite to setup the test fixture).

It is important to consider the Pesticide Paradox – often test automation suffers from static test data. Without variation in the data or execution path, bugs are located with decreasing frequency. Automation should be designed so that a test-case writer can provide a common equivalence class or data type as a parameter, and the automation framework will use the data randomly within that class and apply it for each test run. The automation framework should record the seed values, or actual data that was used, to allow reruns and retesting with the same data for debugging purposes.
It can utilize a TestFixtureRegistry via dedicated table – it’ll be able to expose various parts of a fixture, needed for suites, via discrete fixture holding class variables or via Finder Methods. Finder Methods helps us avoid hard-coded values in DB lookups in order to access the fixture objects. Those methods are very similar to those of Creation Methods, but they return references to existing fixture objects rather than building brand new ones. We should make those immutable.

NOTE: usage of Intent-Revealing Names for the fixture objects should be enforced in order to support the framework’s lookup functionality and better readability. To keep the current design the following implementation can be used – check if such registry already exists and remove it;

NOTE: in consideration must be taken the Data Sensitivity and Context Sensitivity of the tests that’ll rely on this module.

NOTE: take into consideration the Data Sensitivity and Context Sensitivity of the tests that’ll rely on this module.

Test Approach Phases

The suggested test phases are based on the development schedule per project, along with the need to comply with data requirements that need to be in place when the new DWH goes live.

Phase 1: Business processes

  •  Data Validation
  •  Performance Test
  •  Functional Test
  •  Data Warehouse (internal testing within ETL validating data stage jobs)

Data validation should start early in the test process and be completed before Phase 2 testing begins. Some data validation testing should occur in the remaining test phases, but to a much lesser extent.

Important business processes where performance is important should be identified and tested (when available) in the Phase 1. Performance testing should be continued in the later test phases as the application will be continuously enhanced throughout the project. In addition to Phase 1 testing, there will also be unit and functional testing. As unit testing is completed for a program, the tester will perform functional tests on the program. While functional testing takes place with one program, the developer continues with redeveloping and unit testing the next program.

Towards the end of Phase 1, the data warehouse team will be testing the data stage jobs. Unit testing should be completed first, then functional testing finishing a couple weeks afterwards. A final formal test will cap the end of Phase 1 testing.

Phase 2: Performance tests 

  •   Cross-functional process
  •   Security Test
  •   Data Warehouse (Repository testing and validation)

In addition to the previous tests, Phase 2 should also cover remaining test items that may not been tested in Phase 1 such as:

Phase 2 testing will be important because it is the final testing opportunity that the functional area testers will have to make sure the DW load works as expected before moving to regression testing in Phase 3. Some performance tests and data validation should be included in this phase.

Phase 3: Regression tests

Phase 3 testing is comprised of regression test periods to test updates that are required as part of the Company gaming platform. The functional area testers should have sufficient time to test in each regression test period.

Phase 4: Acceptance tests

Phase 4 testing is limited. In addition to the functional area testers, end users will probably be involved in this final test before the new system goes live. In customer acceptance testing, no new tests should be introduced at this time. Customer acceptance tests should have already been tested in prior test phases.


Evgeni Kostadinov

Evgeni Kostadinov prefers the challenges associated with testing of various technologies. He has extensive experience with UI, API, DW, Performance and Mobile. He has worked mainly in projects with Java, .Net and NodeJS environments for Telecom, Financial, Marketing and Banking Institutions. He actively participates in the development of the company’s QA and CI processes and infrastructure. Evgeni currently works as a QA Manager, technical trainer, as well as a QA Challenge Accepted lecturer.

Evgeni Kostadinov
Evgeni Kostadinov prefers the challenges associated with testing of various technologies. He has extensive experience with UI, API, DW, Performance and Mobile. He has worked mainly in projects with Java, .Net and NodeJS environments for Telecom, Financial, Marketing and Banking Institutions. He actively participates in the development of the company’s QA and CI processes and infrastructure. Evgeni currently works as a QA Manager, technical trainer, as well as a QA Challenge Accepted lecturer.

The Related Post

Test engineers face a rapidly changing mobile application landscape, making mobile test automation a necessity. We know that mobile apps are becoming increasingly complex along with the technological advances in tablets and smartphones. Test engineers have to address multiple challenges while testing data-centric mobile apps, including the growing diversity of device features, platforms, and technologies. Fortunately, ...
LogiGear Magazine, September 2015: Mobile Testing
In today’s mobile-first world, a good app is important, meaning an effective Mobile Testing strategy is  essential.  
Drawing from the Greek mythology of the lotus eaters, Anne-Marie Charrett warns testers to be weary of enjoying early success too soon upon finding high impact bugs.
By focusing on test design, analyzing test requirements and optimizing the approach to testing, it’s possible to maximize mobile test automation cost effectively. In a previous article we outlined the importance of understanding the mobile ecosystem and test design for planning and executing mobile testing. The focus of this article is about efficient mobile test ...
A sampling of some free, online, and easy-to-use mobile device emulators that can help get you started with testing. ScreenFly A free, customizable tool to test your website on any screen size, including desktops, tablets, televisions, and mobile phones.
I am not a big fan of concepts which moves industry standards to IT. I am rather a Agile and Scrum guy. Managing multiple projects at once and trying to set a highest quality standard is a challenge and this book shows how industrial language can be translated into software development. I do not think that it ...
The most significant facet of mobile testing is understanding the mobile ecosystem. Mobile applications are growing in use and sophistication along with the speed of the networks and the increasing power of the devices. To be sure, mobile application testing can be done with minimal automation, but a high level of test automation will help ...
In the last issue on testing the SMAC stack we talked about the social and mobile aspects of testing. We will be referring to them in this article. In this issue part 2, we focus on the Analytics and Cloud aspect. The goal of this article is to understand a simple landscape of analytics and cloud.
LogiGear_Magazine_December 2016_Riding the New Software Testing Wave  
The outbreak of smartphones and tablets forces us to be digitally available with speed. Keeping pace with communication tool developments, Lindiwe Vinson defines the methods used at Organic, Inc. where she leads her team discovering bugs using various key programs for both PC and Mac platforms.

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay in the loop with the lastest
software testing news