Considerations for Automating Testing for Data Warehouse and ETL Projects

Introduction

A characteristic of data warehouse (DW) development is the frequent release of high-quality data for user feedback and acceptance. At the end of each iteration of DW ETLs (Extract-Transform-Load), data tables are expected to be of sufficient quality for the next ETL phase. This objective requires a unique approach to quality assurance methods and tools. Foremost, it means integrating QA efforts and Automation into ETL development iterations.

Essential to integrated ETL testing is Test Automation. Manual Testing is not practical in a highly iterative and adaptive development environment. There are 2 key problems with Manual Testing.

First, it takes too long and is therefore a significant inhibitor to the frequent delivery of working software. Teams that rely primarily on Manual Testing ultimately end up deferring testing until dedicated testing periods, which allows bugs to accumulate.

Second, Manual ETL Testing is not sufficiently repeatable for Regression Testing. While seeking to embrace and adapt to change, we must always be confident that features that were “done, complete!” in previous iterations retain their high quality, considering the changing systems.

Test Automation requires initial planning and ongoing diligence, but once technical teams embrace Automation, project success is more assured.

Data Warehouse Test Automation

Data Warehouse Test Automation is described as the use of tools to control (1) the execution of tests, (2) the comparison of actual outcomes to predicted outcomes, and (3) the setting up of test preconditions as well as other test control and test reporting functions. Commonly, Test Automation involves automating a manual process already in place that uses a formalized testing process.

Although manual ETL tests may find many data defects, it is a laborious and time-consuming process. In addition, manual tests may not be effective in finding certain classes of defects. DW Test Automation involves writing programs for testing that would otherwise need to be done manually. Once tests have been automated, they can be run quickly and repeatedly. This is often the most cost-effective method for a DW that may have a long maintenance life; this is because even minor patches or enhancements over the lifetime of the warehouse can cause features that were working earlier to break.

Integrated Automated Testing in ETL development presents a unique set of challenges. Current Automated Testing tools designed for software development are not easily adaptable for database and ETL projects; large data volumes can make Automated Testing a daunting task. The wide variety of DW architectures further complicate these challenges because they involve multiple databases which require special coding for data extraction, transformations, and loading, as well as data cleansing, data aggregations, and data enrichments.

Test Automation tools can be expensive; the tools are usually employed in combination with Manual Testing. However, they can become cost-effective in the longer term, particularly when used repeatedly in Regression Testing.

Goals for Test Automation in DW ETL Processes

What to automate, when to automate, or even whether one really needs Automation are all crucial decisions which the testing (or development) team must make. Selecting the correct features of the product for Automation largely determines the success of the Automation.

Automating unstable features or features that are undergoing changes should be avoided. Today, there is no known commercial tool or a set of methods/processes that can be said to represent comprehensive End-to-End DW testing. Leading ETL tool vendors from Informatica, Microsoft, and others do not promote any one test tool as the answer to ETL Automated Testing.

Common Objectives for ETL Test Automation

  • Automate as many DW testing cycle activities as possible
  • Develop a regression test suite for system testing and production monitoring
  • Focus on numeric financial information for compliance and financial reporting
  • Identify mismatched and missing data
  • Profile the performance of the DW
  • Provide assurances to auditors that all financial information residing within a DW can be trusted
  • Reconcile data between input and transformation processes
  • Reduce Manual Testing workloads, which is often thousands of SQL scripts
  • Validate information between sources and their final destination in the DW
  • Verify all data, not just a subset or sample
  • Verify implementation of complex data transformation rules
  • Verify target data after each the ETL process

Automation of these tests may also be required due to the complexity of processing and the number of sources and targets that require checks. Without Automation, these checks are often costly or, in some cases, impossible to do manually.

ETL Functions That Are Often Candidates for Test Automation  

  • Data aggregation processing
  • Data cleansing and archiving
  • Dimension table data loads
  • End-to-End Testing
  • Extract-Transform-Load (ETL) validation and verification testing
  • Fact table data loads
  • File/data loading verification
  • Incremental Load TestingLoad and Scalability Testing
  • Performance Testing
  • Regression Testing
  • Security Testing
  • Source Data Testing and profiling
  • Staging, ODS data validations
  • BI Report Testing

For most DW development, the ETL test processes are designed to check for and implement data quality. Therefore, the selection of a high-functioning ETL tool (e.g., Informatica, SSIS, DataStage, in-house developed, etc.) is a serious concern for the DW project.

Deciding Which ETL Tests to Automate

The trick is to figure out what needs to be automated, and how to approach this task. An array of questions must be considered in the course of automating tests such as:

  • How much will it cost to automate tests?
  • Who will do the Automation (ex. Devs., QA)?
  • What testing tools should be used (ex. open-source, commercial)?
  • How will the test results be reported?
  • Who will interpret the test results?
  • What will script maintenance consist of?
  • What will happen to the test scripts after the launch of the application?
Figure 1: Comparing Manual Testing execution time vs. testing execution time

Figure 1 shows examples of time durations for manual vs. automated test cases for a sample set of test cases.

A primary objective of Automated DW Testing is coverage of the most critical functions of loading a DW synchronization and reconciliation of source and target data.

Benefits and Limitations of Automated ETL Testing

Some ETL Test Automation Challenges

  • Large numbers of source and target tables and records
  • Testing the data synchronization for all DW tables
  • Testing business intelligence or analytic reports through Automation

Benefits

  • Executing test cases faster. Automation can speed up test case implementation
  • Results in reusable test suites. Once test scripts have been created with Automation tools, they can be saved for easier recall and reuse.
  • Eases test reporting. An attractive feature of many automated tools is their ability to generate reports and test records. Such capabilities provide an accurate representation of the state of the data, clearly identify any defects, and be used in compliance audits.
  • Reduces staffing and rework costs. The time that would be spent on Manual Testing or retesting after fixing defects can be spent on other initiatives within the IT department.

Potential Limitations

  • Cannot eliminate Manual Testing. Although Automation can be used for many applications and test cases, it cannot totally replace Manual Testing. Complex test cases will still exist where Automation may not catch everything, and for User Acceptance Testing, end users must often manually run tests. Therefore, it is important to have the right mix of Automated and Manual Testing in the process.
  • Cost of tools. Automated Testing tools can be costly, depending on their size and functionality. At first glance, the business may not see this as a necessary cost; however, the reusability alone can quickly turn it into an asset.
  • Cost of training. Some automated tools can be complex to use and may require training to get started. Testers must be trained not only on the software, but also on the automated test planning process.
  • Automation needs effective planning, preparation, and dedicated resources. The success of Automated Testing is highly dependent on clear testing requirements and careful test case development before testing begins. Because each organization and DW application can be unique, an Automated Testing tool will not frequently create test cases. Unfortunately, test case development is still a manual process.

Getting Started with ETL Test Automation

Not all ETL testing is suitable for Automation. Assess the situations mentioned above to determine which types of Automation would benefit the testing process and how much is needed. Evaluate testing requirements and identify inefficiencies that may be fixed with Automated Testing. QA teams that spend a lot of time on Regression Testing will benefit the most.

Develop a business case for Automated Testing. Automated Testing generally comes at an additional cost over Manual Testing, so in order to convey the value to the business, IT must first make the case.

Evaluate the options. After evaluating the current state and requirements within the IT department, investigate which tools fit the organization’s testing processes and environments. Options may include vendor, open source, in-house, or a combination of the tools.

Typical Automated ETL Testing Scenarios

A principle of Agile and other modern development is Automated Testing. We can apply this awareness to DW.

Figure 2. End-to-End ETL Testing

An important DW testing consideration is that the number of tests that are run will continue to grow to verify added functionality.

As shown in Figure 2, there are 4 major areas of End-to-End ETL testing. For Data Automation Testing, the emphasis at each test entry point is the validation of data integrity.

When implementing Test Automation, data can be tracked from the source layer, through ETL processing, to loads in the DW, then finally to the front-end applications or reports. If corrupt data is found in a front-end application or report, the execution of automated suites can help to more rapidly determine whether the problem is located in the data source, an ETL process, in a DW database/data mart or in the business intelligence reports.

For Performance Testing, the same test entry points are utilized to focus on characterizing subsystem response under load. A similar strategy is used for data integrity validation to determine the origin of performance issues. Subsystem performance can be measured at any of the identified test entry points.

An emphasis on rapid localization of either data and performance problems in complex DW architectures provides a key tool for promoting efficiencies in development, for shortening build cycles, and meeting release targets.

Conclusions

Automated Testing tools are most helpful when used in the following circumstances.

  • Repetitive, mundane tasks. As ETL processes are developed, they will be exposed to many revisions and through many versions—each of which must be tested. By creating test scripts once, Automated Testing allows reuse for future versions.
  • Regression Testing. Each time a change is made to data or business rules, testers using a manual approach must go through each function and make sure it does not affect any other part of the system. Automated Regression Testing executes test cases to do this much faster.
  • Multiple combinations. In increasingly complex environments where many scenarios must be examined, Automated Testing can rapidly run many test case combinations using a few scripts.
  • Re-usability: These automated tests can be reused with minimum efforts for Regression Testing when newer versions of the source systems are introduced.

As Test Automation has rapidly become an essential alternative to Manual Testing, more companies are looking for tools and strategies to successfully apply Automation. This trend has led to the significant growth in Test Automation tools based on Selenium, Katalon Studio, Appium, and many more.

However, to fully utilize these Automation tools, DW and BI QA teams must possess appropriate programming skills. Adding to that, businesses are becoming more Agile and applications are updated more frequently.

Many IT experts have predicted that ongoing, the knowledge gap between the testers and developers must and will be narrowed. Automated ETL testing tools can significantly reduce the amount of time spent testing code in comparison to traditional manual methods. Other benefits include the creation of reusable code and a reduction in costs associated with personnel and rework.

Gain knowledge about Automated DW Testing and tools to decide if it’s right for your QA project.

Wayne Yaddow
Wayne Yaddow is an independent consultant with more than 15 years of experience leading data migration/integration/ETL testing projects in businesses, including J.P. Morgan Chase, Credit Suisse, and Standard & Poor’s. Wayne has taught courses on DW, ETL, and data integration testing. He continues to lead ETL testing and coaching projects on a consulting basis.

The Related Post

Two dominant manual testing approaches to the software testing game are scripted and exploratory testing. In the test automation space, we have other approaches. I look at three main contexts for test automation: 1. Code context – e.g. unit testing. 2. System context – e.g. protocol or message level testing. 3. Social context – e.g. ...
This article was developed from concepts in the book Global Software Test Automation: Discussion of Software Testing for Executives. Introduction There are many potential pitfalls to Manual Software Testing, including: Manual Testing is slow and costly. Manual tests do not scale well. Manual Testing is not consistent or repeatable. Lack of training. Testing is difficult ...
TestArchitect TM is the name we have given to our automation toolset. It reflects the vision that automated testing requires a well-designed architectural plan allowing technical and non-technical elements to work fluidly in their capacity. It also addresses the continual missing link of all test automation tools of how to design tests. In TestArchitect the test ...
Utility: A program that performs a specific task related to the management of computer functions, resources, or files, as password protection, memory management, virus protection, and file compression. Tool: A program or application that software development teams use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to programs that can be combined together ...
“Testing Applications on the web” – 2nd EditionAuthors: Hung Q. Nguyen, Bob Johnson, Michael HackettPublisher: Wiley; edition (May 16, 2003) This is good book. If you test web apps, you should buy it!, April 20, 2001By Dr. Cem Kaner – Director of Florida Institute of Technology’s Center for Software Testing Education & Research Book Reviews ...
The 12 Do’s and Don’ts of Test Automation When I started my career as a Software Tester a decade ago, Test Automation was viewed with some skepticism.
Learn how to leverage TestArchitect and Selenium for turnkey, Automated Web testing. TestArchitect lets you create, manage, and run web-based automated tests on different types of browsers—using either a WebDriver or non-WebDriver technique. In this article, we will explore employing WebDriver for testing a web-based application with TestArchitect. TestArchitect with WebDriver is a tool for automating ...
One of my current responsibilities is to find ways to automate, as much as practical, the ‘testing’ of the user experience (UX) for complex web-based applications. In my view, full test automation of UX is impractical and probably unwise; however, we can use automation to find potential UX problems, or undesirable effects, even in rich, ...
Investing in Test Automation training will increase your team’s productivity. The availability of reliable jobs in a competitive US market seems to be constantly embattled with competition and replacements of artificial intelligence (AI). In 2016, Foxconn replaced 60,000 employees with robots. However, the growth of Test Automation as an occupation has highlighted an intriguing option ...
Every once in a while a book is put together that should be read by every person with a relationship to software development. This book is one of them. Everyone dreams of automating their software testing, but few make it a reality. This down-to-earth book contains stories of 28 teams that went for it, including ...
This book isn’t for everyone, but everyone can get some value out of it. What I mean by that rather confusing statement is that folks working in Agile environments will likely want to throw the book across the room while folks in more bureaucratic environments like CMMI or other waterfall environments will likely get a ...
Having the right Test Automation plan helps bridge gaps and fragmentations in the complex mobile environment. Figuring out the best Test Automation plan is one of the biggest frustrations for today’s digital teams. Organizations struggle to develop cross-platform Test Automation that can fit with their Continuous Integration cadence, their regression cycles and other elements of ...

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay in the loop with the lastest
software testing news

Subscribe