How to Test Your Data Warehouse


To start with, we need a Test schedule. The same is created in the process of developing the Test plan. In this schedule, we have to estimate the time required for testing of the entire Data Warehouse system. There are different methodologies available to create a Test schedule. None of them are perfect because the data warehouse ecosystem is very complex and large, also constantly evolving in nature. The most important takeaway from this article is that DW testing is data centric, while software testing is code centric. The connections between the DW components are groups of transformations that take place over data. These transformation processes should be tested as well, to ensure data quality preservation. The DW testing and validation techniques I introduce here are broken into four well defined processes, namely:

1. Integration testing
2. System testing
3. Data validation
4. Acceptance testing

Thus these types of testing are again divided in the following sub-phases:

  • Understanding of Requirements
  • Development of Test Plan and Test Design
  • Preparation of Test Cases
  • Execution of Test Cases
  • Reporting of the Test results

As you can see, the first major phase is the Requirements Definition. Data used in a Data Warehouse originates from multiple source systems. Therefore, defining good requirements can be extremely challenging. Successful requirements are those structured closely to business rules and in the same time address functionality and performance. These business rules and requirements provide a solid foundation to the data architects. Requirements are one of the keys to success.

The whole team should share the same tools from the project toolbox. It’s important that everyone works from the same set of documents and requirements (i.e. version of files). We have to create a rich set of test data, avoiding combinational explosion of sets. The automation approach should be a combination of Task automation and Test automation. An alternative strategy is to use the source systems to create the artificial test data. That way a realistic but artificial source data is created. The testers should be skilled as well.

Working with the concept of the Prototype model, the testing activities can be summarized in the form of the following checklist:

1. Multidimensional Schema

  • Workload Test
  • Hierarchy Test
  • Conformity Test
  • Nomenclature check
  • Performance Test
  • Early loading Test
  • Security Test
  • Maintainability Test

2. ETL Procedures

  • Code Test
  • Integrity Test
  • Integration Test
  • Administrability Test
  • Performance/Stress Test
  • Recovery Test
  • Security Test
  • Maintainability Test

3. Physical Schema

  • Performance/Stress Test
  • Recovery Test
  • Security Test

4. Front-end

  • Balancing Test
  • Usability Test
  • Performance/Stress Test
  • Security Test

In order to achieve our testing goals, we could focus on the following two approaches:

1. Manual Sampling – The testing activities covered under this approach are:
End-to-End Testing – It checks that data is properly loaded into the systems from which the data warehouse will extract data to generate Reports.
Row count Testing – To avoid any loss of data, all rows of data are counted after the ETL process to ensure that all the data is properly loaded.
Field size Testing – It checks that the data warehouse field should be bigger than the data field for the data being loaded. If it is not checked it will lead to data truncation.
Sampling – The sample used for testing must be a good representation of whole data.

2. Reporting – The testing activities covered under this approach are:
Report Testing – The reports are checked to see that the data displayed in the reports are correct and can be used for decision-making purpose.
Next major part of our testing is to choose a Test fixture strategy. I prefer to use Fresh fixture – ensuring that tests do not depend on anything they did not set up themselves. Combined with Lazy (Initialization) setup and Shared fixture (partitioning the fixture required by tests into two logical parts). That is assuming that we are happy with the idea of creating the test fixture the first time any test needs it, we can use Lazy Setup in the setUp() method of the corresponding Testcase Class to create it as part of running the first test. Subsequent tests will then see that the fixture already exists and reuse it.

The first part is the stuff every test needs to have present but is never modified by any tests—that is, the Immutable Shared Fixture. The second part is the objects that any test needs to modify or
delete; these objects should be built by each test as Fresh Fixtures. Most commonly, the Immutable Shared Fixture consists of reference data that is needed by the actual per-test fixtures. The per-test fixtures can then be built as Fresh Fixtures on top of the Immutable Shared Fixture). We also have the option called Minimal Fixture (i.e. use of smallest and simplest fixture possible for each test). Aiming at full post I have to mention the last option I would use, although it’s considered as Anti-pattern, is called Chained Tests (i.e. let the other tests in a test suite to setup the test fixture).

It is important to consider the Pesticide Paradox – often test automation suffers from static test data. Without variation in the data or execution path, bugs are located with decreasing frequency. Automation should be designed so that a test-case writer can provide a common equivalence class or data type as a parameter, and the automation framework will use the data randomly within that class and apply it for each test run. The automation framework should record the seed values, or actual data that was used, to allow reruns and retesting with the same data for debugging purposes.
It can utilize a TestFixtureRegistry via dedicated table – it’ll be able to expose various parts of a fixture, needed for suites, via discrete fixture holding class variables or via Finder Methods. Finder Methods helps us avoid hard-coded values in DB lookups in order to access the fixture objects. Those methods are very similar to those of Creation Methods, but they return references to existing fixture objects rather than building brand new ones. We should make those immutable.

NOTE: usage of Intent-Revealing Names for the fixture objects should be enforced in order to support the framework’s lookup functionality and better readability. To keep the current design the following implementation can be used – check if such registry already exists and remove it;

NOTE: in consideration must be taken the Data Sensitivity and Context Sensitivity of the tests that’ll rely on this module.

NOTE: take into consideration the Data Sensitivity and Context Sensitivity of the tests that’ll rely on this module.

Test Approach Phases

The suggested test phases are based on the development schedule per project, along with the need to comply with data requirements that need to be in place when the new DWH goes live.

Phase 1: Business processes

  •  Data Validation
  •  Performance Test
  •  Functional Test
  •  Data Warehouse (internal testing within ETL validating data stage jobs)

Data validation should start early in the test process and be completed before Phase 2 testing begins. Some data validation testing should occur in the remaining test phases, but to a much lesser extent.

Important business processes where performance is important should be identified and tested (when available) in the Phase 1. Performance testing should be continued in the later test phases as the application will be continuously enhanced throughout the project. In addition to Phase 1 testing, there will also be unit and functional testing. As unit testing is completed for a program, the tester will perform functional tests on the program. While functional testing takes place with one program, the developer continues with redeveloping and unit testing the next program.

Towards the end of Phase 1, the data warehouse team will be testing the data stage jobs. Unit testing should be completed first, then functional testing finishing a couple weeks afterwards. A final formal test will cap the end of Phase 1 testing.

Phase 2: Performance tests 

  •   Cross-functional process
  •   Security Test
  •   Data Warehouse (Repository testing and validation)

In addition to the previous tests, Phase 2 should also cover remaining test items that may not been tested in Phase 1 such as:

Phase 2 testing will be important because it is the final testing opportunity that the functional area testers will have to make sure the DW load works as expected before moving to regression testing in Phase 3. Some performance tests and data validation should be included in this phase.

Phase 3: Regression tests

Phase 3 testing is comprised of regression test periods to test updates that are required as part of the Company gaming platform. The functional area testers should have sufficient time to test in each regression test period.

Phase 4: Acceptance tests

Phase 4 testing is limited. In addition to the functional area testers, end users will probably be involved in this final test before the new system goes live. In customer acceptance testing, no new tests should be introduced at this time. Customer acceptance tests should have already been tested in prior test phases.


Evgeni Kostadinov

Evgeni Kostadinov prefers the challenges associated with testing of various technologies. He has extensive experience with UI, API, DW, Performance and Mobile. He has worked mainly in projects with Java, .Net and NodeJS environments for Telecom, Financial, Marketing and Banking Institutions. He actively participates in the development of the company’s QA and CI processes and infrastructure. Evgeni currently works as a QA Manager, technical trainer, as well as a QA Challenge Accepted lecturer.

Evgeni Kostadinov
Evgeni Kostadinov prefers the challenges associated with testing of various technologies. He has extensive experience with UI, API, DW, Performance and Mobile. He has worked mainly in projects with Java, .Net and NodeJS environments for Telecom, Financial, Marketing and Banking Institutions. He actively participates in the development of the company’s QA and CI processes and infrastructure. Evgeni currently works as a QA Manager, technical trainer, as well as a QA Challenge Accepted lecturer.

The Related Post

What you need to know in order to have effective and reliable Test Automation for your mobile apps I realized that Test Automation interfaces are pivotal to effective and efficient Test Automation, yet very few people trying to test their mobile apps seemed to know how their automated tests connected with the apps they wanted ...
Great mobile testing requires creativity to find problems that matter. I’d like to take you through the thought process of testers and discuss the types of things they consider when testing a mobile app. The intention here is to highlight their thought processes and to show the coverage and depth that testers often go to.
What you need to know for testing in the new paradigm This two part article analyzes the impact of the Internet of Things (IoT) product development on traditional testing. Part one of this series starts with a wide view on the IoT, embedded systems and device development aspects of testing. Part two, to be published ...
Mobile is no longer an area that a few UX people specialize in, and we need to start designing and testing everything for smartphones and tablets as well as computers. If you’re new to mobile usability testing, fear not. It is not as hard as you might think but there are some key differences from ...
iOS culture, even in many large organizations with skilled engineers, is behind on up-to-date testing practices. Agile development has long been all the rage; indeed, in most modern development shops the great agile methodologies are old hat. If you come from a software background like Ruby on Rails, Python, or certain Java niches, you may–until ...
Strategies to Approach Mobile Web App Testing Mobile web technology has been continuously changing over the past few years, making “keeping up” challenging. In this article, Raj Subramanian covers the latest trends and changes happening in the mobile web and how testers can prepare for them.
LogiGear Magazine – September 2013 – Mobile Testing
By focusing on test design, analyzing test requirements and optimizing the approach to testing, it’s possible to maximize mobile test automation cost effectively. In a previous article we outlined the importance of understanding the mobile ecosystem and test design for planning and executing mobile testing. The focus of this article is about efficient mobile test ...
A fully integrated test automation platform for mobile testing Mobile application testing is taking on greater significance as just about everything done on the desktop is transitioning to mobile. It’s taken a while for mobile testing in the consumer space to be made much of a priority. Now that sensitive data is being passed back ...
CEO and founder of mVerify Corporation, Robert V. Binder tackles questions from field testers regarding such issues as strategic considerations when dealing with single stack apps versus globalized enterprise mobile apps, and methods and tools that developers and testers should be aware of. He also offers his own advice from lessons learned from experience. 1. ...
Test engineers face a rapidly changing mobile application landscape, making mobile test automation a necessity. We know that mobile apps are becoming increasingly complex along with the technological advances in tablets and smartphones. Test engineers have to address multiple challenges while testing data-centric mobile apps, including the growing diversity of device features, platforms, and technologies. Fortunately, ...

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay in the loop with the lastest
software testing news